by subhrm on 6/26/19, 6:29 AM with 373 comments
1- Finding information is trivial
2- You don't need services indexing billions of pages to find any relevant document
In our current internet, we need a big brother like Google or Bing to effectively find any relevant information in exchange for sharing with them our search history, browsing habits etc. Can we design a hypothetical alternate internet where search engines are not required?
by adrianmonk on 6/26/19, 5:10 PM
Indexing isn't the source of problems. You can index in an objective manner. A new architecture for the web doesn't need to eliminate indexing.
Ranking is where it gets controversial. When you rank, you pick winners and losers. Hopefully based on some useful metric, but the devil is in the details on that.
The thing is, I don't think you can eliminate ranking. Whatever kind of site(s) you're seeking, you are starting with some information that identifies the set of sites that might be what you're looking for. That set might contain 10,000 sites, so you need a way to push the "best" ones to the top of the list.
Even if you go with a different model than keywords, you still need ranking. Suppose you create a browsable hierarchy of categories instead. Within each category, there are still going to be multiple sites.
So it seems to me the key issue isn't ranking and indexing, it's who controls the ranking and how it's defined. Any improved system is going to need an answer for how to do it.
by iblaine on 6/26/19, 4:19 PM
I'm old enough to remember sorting sites by new to see what new URLs were being created, and getting to that bottom of that list within a few minutes. Google and search was a natural response to solving that problem as the number of sites added to the internet grew exponentially...meaning we need search.
by ovi256 on 6/26/19, 1:01 PM
Either you find a way to make information findable in a library without an index (how?!?) or you find a novel way to make a neutral search engine - one that provides as much value as Google but whose costs are paid in a different way, so that it does not have Google's incentives.
by neoteo on 6/26/19, 8:22 AM
by alfanick on 6/26/19, 1:00 PM
What if this new Internet instead of using URI based on ownership (domains that belong to someone), would rely on topic?
In examples:
netv2://speakers/reviews/BW netv2://news/anti-trump netv2://news/pro-trump netv2://computer/engineering/react/i-like-it netv2://computer/engineering/electron/i-dont-like-it
A publisher of webpage (same html/http) would push their content to these new domains (?) and people could easily access list of resources (pub/sub like). Advertisements are driving Internet nowadays, so to keep everyone happy, what if netv2 is neutral, but web browser are not (which is the case now anyway)? You can imagine that some browsers would prioritise some entries in given topic, some would be neutral, but harder to retrieve data that you want.
Second thought: Guess what, I'm reinventing NNTP :)
by codeulike on 6/26/19, 9:17 AM
by davidy123 on 6/26/19, 1:22 PM
The early Web wrestled with this, early on it was going to be directories and meta keywords. But that quickly broke down (information isn't hierarchical, meta keywords can be gamed). Google rose up because they use a sort of reputation system based index. In between that, there was a company called RealNames, that tried to replace domains and search with their authoritative naming of things, but that is obviously too centralized.
But back to Google, they now promote using schema.org descriptions of pages, over page text, as do other major search engines. This has tremendous implications for precise content definition (a page that is "not about fish" won't show up in a search result for fish). Google layers it with their reputation system, but these schemas are an important, open feature available to anyone to more accurately map the web. Schema.org is based on Linked Data, its principle being each piece of data can be precisely "followed." Each schema definition is crafted by participation from industry and interest groups to generally reflect its domain. This open world model is much more suitable to the Web, compared to the closed world of a particular database (but, some companies, like Amazon and Facebook, don't adhere to it since apparently they would rather their worlds have control; witness Facebook's open graph degeneration to something that is purely self-serving).
by _nalply on 6/26/19, 7:18 AM
If we could kill advertisement permanently, we can have an internet as described in the question. This will almost be like an emergent feature of the internet.
by quelsolaar on 6/26/19, 5:16 PM
This means that in terms of hardware, you can build your own google, then you get to decide how it rates things and you don't have to worry about ads and SEO becomes much harder because there is no longer one target to SEO. Google obviously don't want you to do this (and in fairness google indexes a lot of stuff that isn't keywords form web pages), but it would be very possible to build an open source configurable search engine that anyone could install, run, and get good results out of.
(Example: The piratebay database, that arguably indexes the vast majority of avilable music / tv / film / software was / is small enough to be downloaded and cloned by users)
by theon144 on 6/26/19, 8:34 AM
Search engines are there to find and extract information in an unstructured trove of webpages - no other way to process this than with something akin to a search engine.
So either you've got unstructured web (the hint is in the name) and GoogleBingYandex or a somehow structured web.
The latter has been found to be not scalable or flexible enough to accomodate for unanticipated needs - and not for a lack of trying! This has been the default mode of web until Google came about. Turns out it's damn near impossible to construct a structure for information that won't become instantly obsolete.
by swalsh on 6/26/19, 1:18 PM
Centralization happens because the company owns the data, which becomes aggregated under one roof. If you distribute the data it will remove the walled gardens, multiple competitors should be able to pop up. Whole ecosystems could be built to give us 100 googles.... or 100 facebooks, where YOU control your data, and they may never even see your data. And because we're moving back to a world of open protocols, they all work with each other.
These companies aren't going to be worth billions of dollars any more.... but the world would be better.
by alangibson on 6/26/19, 5:01 PM
Fast information retrieval requires an index. A better formulation of the question might be: how do we maintain a shared, distributed index that won't be destroyed by bad actors.
I wonder if the two might have parts of the solution in common. Maybe using proof of work to impose a cost on adding something to the index. Or maybe a proof of work problem that is actually maintaining the index or executing searches on it.
by lefstathiou on 6/26/19, 4:48 PM
1) Determining what percentage of search engine use is driven by the need for a short cut to information you know exists but dont feel like accessing the hard way
2) Information you are actually seeking.
My initial reaction is that making search engines irrelevant is a stretch. Here is why:
Regarding #1, the vast majority of my search activity involves information I know how and where to find but seek the path of least resistance to access. I can type in "the smith, flat iron nyc" and know I will get the hours, cross street and phone number for the Smith restaurant. Why would I not do this instead of visiting the yelp website, searching for the Smith, set my location in NYC, filtering results etc. Maybe I am not being open minded enough but I don't see how this can be replaced short of reading my mind and injecting that information into it. There needs to be a system to type a request and retrieve the result you're looking for. Another example, when I am looking for someone on LinkedIn, I always google the person instead of utilizing LinkedIn's god awful search. Never fails me.
2. In the minority of cases I am looking for something, I have found that Google's results have gotten worse and worse over the years. It will still be my primary port of call and I think this is the workflow that has potential disruption. Other than an Index, I dont know what better alternatives you could offer.
by peteyPete on 6/26/19, 6:56 PM
You can't curate manually.. That just doesn't scale. You also can't let just anyone add to the index as they wish or any/every business will just flood the index with their products... There wouldn't be any difference between whitehat/blackhat marketing.
You also need to be able to discover new content when you seek it, based on relevancy and quality of content.
At the end of the day, people won't be storing the index of the net locally, and you also can't realistically query the entire net on demand. That would be an absolutely insane amount of wasted resources.
All comes back to some middleman taking on the responsibility (google,duckduckgo,etc).
Maybe the solution is an organization funded by all governments, completely transparent, where people who wish to can vote on decisions/direction. So non profit? Not driven by marketing?
But since when has government led with innovation and done so at a good pace? Money drives everything... And without a "useful" amount of marketing/ads etc, the whole web wouldn't be as it is.
So yes, you can.. But you won't have access to the same amount of data, as easily, will likely have a harder time finding relevant information (especially if its quite new) without having to parse through a lot of crap.
by kyberias on 6/26/19, 7:33 AM
1. Finding information is trivial
2. You don't need services indexing billions of rows to find any relevant document
by fghtr on 6/26/19, 7:26 AM
The evil big brothers may not be necessary. We just need to expand alternative search engines like YaCy.
by azangru on 6/26/19, 7:14 AM
by lxn on 6/26/19, 7:23 AM
With a distributed open search alternative the algorithm is more susceptible to exploits by malicious actors.
Having it manually curated is too much of a task for any organization. If you let user vote on the results... well, that can be exploited as well.
The information available on the internet is to big to make directories effective (like it was 20 years ago).
I still have hope this will get solved one day, but directories and open source distributed search engines are not the solution in my opinion unless there is a way to make them resistant to exploitation.
by VvR-Ox on 6/26/19, 10:23 AM
This phenomena can be seen throughout many systems we built - e.g. use of internet, communication, access to electricity or water. We have to pay the profit-maximizing entities for all of this though it could be covered by global cooperatives who manage this stuff in a good way.
by blue_devil on 6/26/19, 9:17 AM
https://www.nytimes.com/2019/06/19/opinion/facebook-google-p...
by Ultramanoid on 6/26/19, 7:09 AM
Most web sites then also had a healthy, sometimes surprising link section, that has all but disappeared these days.
by d-sc on 6/26/19, 4:17 PM
by vbsteven on 6/26/19, 4:14 PM
Each indexer is responsible for a small part of the web and by adding indexers you can increase your personal search area. And there is some web of trust going on.
Entities like stackoverflow and Wikipedia and reddit could host their own domain specific indexers. Others could be crowdsourced with browser extensions or custom crawlers and maybe some people want to have their own indexer that they curate and want to share with the world.
It will never cover the utility and breadth of Google Search but with enough adoption this could be a nice first search engine. With DDG inspired bang commands in the frontend you could easily retry a search on Google.
With another set of colon commands you can limit a search to one specific indexer.
The big part I am unsure about in this setup is how a frontend would choose which indexers to use for a specific query. Obviously sending each query to each indexer will not scale very well.
by dalbasal on 6/26/19, 1:01 PM
I'm not sure what the answer is re:search. But, an easier example to chew on might've social media. It doesn't take a Facebook to make one. There are lots of different social networking sites (including this one) that are orders of magnitude smaller in terms of resources/people involved, even adjusting for size of the userbase.
It doesn't take a Facebook (company) to make Facebook (site). Facebook just turned out to be the prize they got for it. These things are just decided as races. FB got enough users early enough. But, if they went away tomorrow.. users will not lack for social network experiences. Where they get those experiences is basically determined by network effects, not the product itself.
For search, it doesn't take a Google either. DDG make a search engine, and they're way smaller. With search though, it does seem that being a Google helps. They have been "winning" convincingly even without network effects and moat that make FB win.
by zzbzq on 6/26/19, 4:31 PM
Cliff's notes:
- Apps should run not in a browser, but in sandboxed App containers loaded from the network, somewhat between Mobile Apps and Flash/Silverlight. Mobile apps that you don't 'install' from a store, but navigate to freely like the web. Apps have full access to the OS-level APIs (for which there is a new cross-platform standard), but are containerized in chroot jail.
- An app privilege ("this wants to access your files") should be a prominent feature of the system, and ad networks would be required to built on top of this system to make trade-offs clear to the consumer.
- Search should be a functionality owned and operated by the ISPs for profit and should be a low-level internet feature seen as an extension of DNS.
- Google basically IS the web and would never allow such a system to grow. Some of their competitors have already tried to subvert the web by the way they approached mobile.
by btbuildem on 6/26/19, 7:51 PM
It was like a dark maze, and sometimes you'd find a piece of the map.
Search coming online was a watershed moment -- like, "before search" and "after search"
by chriswwweb on 6/26/19, 1:45 PM
But seriously, I'm not sure it is feasible, I wish the internet could auto-index itself and still be decentralized, where any type of content can be "discovered" as soon as it is connected to the "grid".
The advantage would be that users could search any content without filters, without AI tempering with the order based on some rules ... BUT on the other hand, people use search engines because their results are relevant (what ever that means these days), so having an internet that is searchable by default would probably never be a good UX and hence not replace existing search engines. It not just about the internet being searchable, it would have to solve all the problems search engines have solved in the last ten years too
by mhandley on 6/26/19, 8:57 AM
Of course those assumptions may not be valid. Content may grow faster than linear. Content may not all be produced by humans. Storage won't grow exponentially forever. But good content probably grows linearly at most, and maybe even slower if old good content is more accessible. Already it's feasible to hold all of the English wikipedia on a phone. Doing the same for Internet content is certainly going to remain non-trivial for a while yet. But sometimes you have to ask the dumb questions...
by tooop on 6/26/19, 12:55 PM
by GistNoesis on 6/26/19, 10:45 AM
If you don't have the resources to do so yourself, then you'll have to trust something, in order to share the burden.
If you trust money, then gather enough interested people to share the cost of construction of the index, at the end everyone who trust you can enjoy the benefits of the whole for himself, and you now are a search engine service provider :)
Alternatively if you can't get people to part with their money, you can get by needing only their computations, by building the index in a decentralized fashion. The distributed index can then be trusted at a small computation cost by anyone who believe that at least k% of the actors constructing it are honest.
For example if you trust your computation and if you trust that x% of actors are honest :
You gather 1000 actors and have each one compute the index of 1000th of the data, and publish their results.
Then you have each actor redo the computation on the data of another actor picked at random ; as many times as necessary.
An honest actor will report the disagreement between computations and then you will be able to tell who is the bad actor that you won't ever trust again by checking the computation yourself.
The probability that there is still a bad actor lying is (1-x)^(x*n) with n the number of times you have repeated the verification process. So it can be made as small as possible, even if x is small by increasing n. (There is no need to have a majority or super-majority here like in byzantine algorithms, because you are doing the verification yourself which is doable because 1000th of the data is small enough).
Actors don't have the incentive to lie because if they do so, it will be exposed provably as liars forever.
Economically with decreasing cost of computation (and therefore decreasing cost of index construction), public collections of indices are inevitable. It will be quite hard to game, because as soon as there is enough interest gathered a new index can be created to fix what was gamed.
by cf141q5325 on 6/26/19, 8:52 AM
by wlesieutre on 6/26/19, 1:13 PM
Is there a way to update that idea of websites deliberately recommending each other, but without having it be an upvote/like based popularity contest driven by an enormous anonymous mob? It needs to avoid both easy to manipulate crowd voting like reddit and the SEO spam attacks that PageRank has been targeted by.
Some way to say "I value recommendations by X person," or even give individual people weight in particular types of content and not others?
by topmonk on 6/27/19, 5:49 AM
Then we have individual engines that take this data and choose for the user what to display for that user only. So if the user is unhappy with what they are seeing, they simply plug in another engine.
Probably a block chain would be good to store such a thing.
by jonathanstrange on 6/26/19, 8:33 AM
by _Nat_ on 6/26/19, 10:10 AM
Seems like you could access Google/Bing/etc. (or DuckDuckGo, which'd probably be a better start here) through an anonymizing service.
But, no, going without search engines entirely doesn't make much sense.
I suspect that what you'd really want is more control over what your computer shares about you and how you interact with services that attempt to track you. For example, you'd probably like DuckDuckGo more than Google. And you'd probably like Firefox more than Chrome.
---
With respect to the future internet...
I suspect that our connection protocols will get more dynamic and sophisticated. Then you might have an AI-agent try to perform a low-profile search for you.
For example, say that you want to know something about a sensitive matter in real life. You can start asking around without telling everyone precisely what you're looking for, right?
Likewise, once we have some smarter autonomous assistants, we can ask them to perform a similar sort of search, where they might try to look around for something online on your behalf without directly telling online services precisely what you're after.
by gesman on 6/26/19, 3:58 PM
As i see it - new, "free search" internet would be a specially formatted content for each page published that will make it content easily searchable. Likely some tags within existing HTML content to comply with new "free search" standard.
Open source, distributed agents would receive notifications about new, properly formatted "free search" pages and then index such page into the public indexed DB.
Any publisher could release content and notify closest "free search" agent.
Then - just like a blockchain - anyone could download such indexed DB to do instant local searches.
There will be multiple variations of such DB - from small ones (<1TB) to satisfy small users giving just "titles" and "extracts" to large ones who need detailed search abilities (multi TB capacity).
"Free search", distributed agents will provide clutter-free interface to do detailed search for anyone.
I think this idea could easily be pickup up pretty much by everyone - everyone would be interested to submit their content to be easily searchable and escape any middlemen monopoly that is trying to control aspects of searching and indexing.
by hokus on 6/26/19, 8:21 AM
by salawat on 6/26/19, 4:54 PM
The problem is closed algorithms, SEO, and advertising/marketing.
Think about it for a minute. Imagine a search engine that generates the same results for everyone. Since it gives the same results for everyone, the burden of looking for exactly what you're looking for is put back exactly where it needs to be, on the user.
The problem though, is you'll still get networks of "sink pages" that are optimized to show up in every conceivable search, that don't have anything to do with what you're searching for, but are just landing pages for links/ads.
Personally, I liked a more Yellow Pageish net. After you got a knack for picking out the SEO link sinks, and artificially disclose them, you were fine. I prefer this to a search provider doing it for you because it teaches you, the user, how to retrieve information better. This meant you were no longer dependant on someone else slurping up info on your browsing habits to try to made a guess at what you were looking for.
by tablethnuser on 6/26/19, 1:48 PM
e.g. someone's list of installed lists might look like:
- New York Public Library reference list
- Good Housekeeping list of consumer goods
- YCombinator list of tech news
- California education system approved sources
- Joe Internet's surprisingly popular list of JavaScript news and resources
How do you find out about these lists and add them? Word of mouth and advertising the old fashioned way. Marketplaces created specifically to be "curators of curators". Premium payments for things like Amazing Black Friday Deals 2019 which, if you liked, you'll buy again in 2020 and tell your friends.
There are two points to this. First, new websites only enter your search graph when you make a trust decision about a curator - trust you can revoke or redistribute whenever you want. Second, your list-of-lists serves as an overview of your own biases. You can't read conspiracy theory websites without first trusting "Insane Jake's Real Truth the Govt Won't Tell You". Which is your call to make! But at least you made a call rather than some outrage optimizing algorithm making it for you.
I guess this would start as a browser plugin. If there's interest let's build it FOSS.
Edit: Or maybe it starts as a layer on top of an existing search engine. Are you hiring, DDG? :P
by dpacmittal on 6/26/19, 5:47 PM
Can anyone tell me why such an approach wouldn't work?
by 8bitsrule on 6/27/19, 1:13 AM
I regularly use DDG (which claims privacy) for this, and requests can be quite specific. E.g. a quotation "these words in this order" may result in -no result at all-, which is preferable to being second-guessed by the engine.
I wonder how 'search engines are not required' would work without expecting the searcher to acquire expertise in drilling down through topical categories, as attempts like 'http://www.odp.org/' did.
by gexla on 6/26/19, 1:29 PM
First "go-to" for search will be my browser history.
As long as the site I know I'm looking for is in my browser history, then I'll go there and use the search feature to find other items from that site.
Bookmark all the advanced search pages I can find for sites I find myself searching regularly.
Resist mindless searching for crap content which usually just takes up time as my brain is decompressing from other tasks.
For search which is more valuable to me, try starting my search from communities such as Reddit, Twitter or following links from other points in my history.
Maybe if it's not worth going through the above steps, then it's not valuable enough to look up?
NOTE: Sites such as Twitter may not be much better than Google, but I can at least see who is pushing the link. I can determine if this person is someone I would trust for recommendations.
I bet if I did all of the above, I could put a massive dent in the number of search engine queries I do.
Any other suggestions?
by ex3xu on 6/26/19, 6:17 PM
What I would like to see is a human layer of infrastructure on top of algorithmic search, one the leverages the fact that there are billions of people who could be helping others find what they need. That critical mass wasn't available at the beginning of the internet, but it certainly is now.
You kind of have attempts at this function in efforts like the Stack Exchange network, Yahoo Questions, Ask Reddit, tech forums etc. but I'd like to see more active empowerment and incentivization of giving humans the capacity to help other humans find what they need, in a way that would be free from commercial incentives. I envision stuff like maintaining absolutely impartial focus groups, and for commercial search it would be nice to see companies incentivized to provide better quality goods to game search rather than better SEO optimization.
by ntnlabs on 6/26/19, 9:55 AM
by desc on 6/26/19, 8:11 PM
'Web of trust' has its flaws too: a sufficiently large number of malicious nodes cooperating can subvert the network.
However, maybe we can exploit locality in the graph? If the user has an easy way to indicate the quality of results, and we cluster the graph of relevance sources, the barrier to subverting the network can be raised significantly.
Let's say that each ranking server indicates 'neighbours' which it considers relatively trustworthy. When a user first performs a search their client will pick a small number of servers at random, and generate results based on them.
* If the results are good, those servers get a bit more weight in future. We can assume that the results are good if the user finds what they're looking for in the top 5 or so hits (varying depending on how specific their query is; this would need some extra smarts).
* If the results are poor (the user indicates such, or tries many pages with no luck) those servers get downweighted.
* If the results are actively malicious (indicated by the user) then this gets recorded too...
There would need to be some way of distributing the weightings based on what the servers supplied, too. If someone's shovelling high weightings at us for utter crap, they need to get the brunt of the downweighting/malice markers.
Servers would gain or lose weighting and malice based on their advertised neighbours too. Something like PageRank? The idea is to hammer the trusting server more than the trusted, to encourage some degree of self-policing.
Users could also chose to trust others' clients, and import their weighting graph (but with a multiplier).
Every search still includes random servers, to try to avoid getting stuck in an echo chamber. The overall server graph could be examined for clustering and a special effort made to avoid selecting more than X servers in a given cluster. This might help deal with malicious groups of servers, which would eventually get isolated. It would be necessary to compromise a lot of established servers in order to get enough connections.
Of course, then we have the question of who is going to run all these servers, how the search algorithm is going to shard efficiently and securely, etc etc.
Anyone up for a weekend project? >_>
by gist on 6/26/19, 7:08 PM
by Havoc on 6/26/19, 8:28 AM
To me they are conceptually not the problem. Nor is advertising
This new wave of track you everywhere with ai brand of search engines is an issue though. They’ve taken it too far essentially.
Instead of respectable fishing they’ve gone for kilometer long trawling nets that leave nothing in their wake
by hideo on 6/26/19, 4:34 PM
https://www.cs.tufts.edu/comp/150IDS/final_papers/ccasey01.2... http://conferences.sigcomm.org/co-next/2009/papers/Jacobson....
by munchausen42 on 6/26/19, 5:24 PM
E.g., how about an open source spider/crawler that anyone can run on their own machine continuously contributing towards a distributed index that can be queried in a p2p fashion. (Kind of like SETI@home but for stealing back the internet).
Just think about all the great things that researchers and data scientists could do if they had access to every single public Facebook/Twitter/Instagram post.
Okayokay ... also think about what Google and FB could do if they could access any data visible to anyone (but let's just ignore that for a moment ;)
by nonwifehaver3 on 6/26/19, 2:31 PM
Due to this I think people will have to use site-specific searches, directories, friend recommendations, and personal knowledge-bases to discover and connect things instead of search engines.
by cy6erlion on 6/27/19, 7:21 AM
1) Have an index created by a centralized entity like google 2) Have the nodes in the network create the index
The first option is the easiest but can be biased on who gets to be on the index and their position on the index.
Option two is hard because we need a sort of mechanism to generate the index from the subjective view of the nodes in the network and sync this to everyone in the network.
The core problem here is not really the indexing but the structure of the internet, domains/websites are relatively dumb they can not see the network topology, indexing is basically trying to create this topology.
by JD557 on 6/26/19, 3:22 PM
Unfortunately (IIRC and IIUC how Gnutella works), malicious actors can easily break that query schema : just reply to all query requests with your malicious link. I believe this is how pretty much every query in old Gnutella clients returned a bunch of fake results that were simply `search_query + ".mp3"`.
by quickthrower2 on 6/26/19, 6:47 AM
by oever on 6/26/19, 2:55 PM
by inputcoffee on 6/26/19, 2:38 PM
I am being purposefully vague because I don't think people know what an effective version of that would look like, but its worth exploring.
If you have some data you might ask questions like:
1. Can this network reveal obscure information?
2. When -- if ever -- is it more effective than indexing by words?
by ninju on 6/26/19, 5:52 PM
For long-term facts and knowledge lookup: Wikipedia pages (with proper annotation)
For real-time World happens: A mix of direct news websites
For random 'social' news: <-- the only time I direct direct Google/Bing/DDG search
The results from the search engines nowadays are so filled with (labeled) promoted results and (un-labeled) SEO results that I have become cynical and jaded to the value of the results
by jka on 6/26/19, 6:53 PM
Over time the domains that users genuinely organically visit (potentially geo-localized based on client location) should rise in query volume.
Caveats would include DNS record cache times, lookups from robots/automated services, and no doubt a multitude of inconsistent client behavior oddities.
A similar approach could arguably be applied even at a network connection log level.
by mahnouel on 6/26/19, 8:18 AM
by z3t4 on 6/26/19, 5:44 PM
by epynonymous on 6/27/19, 1:06 PM
an example use case would be like a set of apps that my family could use for photo sharing, messaging, sending data, links to websites, etc. perhaps another set of apps for my friends, another for my company, or school. the protocols would not require public infrastructure, dns, etc. perhaps tethering of devices would be enough. there would be a need for indexing and search, email, etc.
by sktrdie on 6/26/19, 2:44 PM
You're effectively crawling portions of the web based on your query, at runtime! It's a pretty neat technique. But you obviously have to trust the sources and the links to provide you with relevant data.
by Johny4414 on 6/26/19, 7:19 AM
by CapitalistCartr on 6/26/19, 5:44 PM
by politician on 6/26/19, 5:00 PM
Discovering new sources of information in this kind of environment is difficult, and basically boils down to another instance of the classic key distribution problem - out-of-band, word-of-mouth, and QR codes.
Search engines like Google and Bing solve the source discovery problem by presenting themselves as a single source; aggregating every other source through a combination of widespread copyright infringement and an opaque ranking algorithm.
Google and Bing used to do a great job of source discovery, but the quality of their results have deteriorated under relentless assaults from SEO and Wall Street.
I think it's time for another version of the Internet where Google is not the way that you reach the Internet (Chrome) or find what you're looking for on the Internet (Search) or how you pay for your web presence (Adsense).
by BerislavLopac on 6/26/19, 8:57 AM
What you call Internet is actually World Wide Web, just another protocol (HTTP) on top of Internet (TCP/IP), which was designed to be decentralised but lacked any worthwhile discovery mechanism before two students designed the BackRub protocol.
by wsy on 6/26/19, 8:04 PM
For example, if you build on a decentralized network, ask yourself how you can prevent SEO companies from adding a huge amount of nodes to promote certain sites.
by rayrrr on 6/26/19, 9:31 PM
by qazpot on 6/26/19, 9:15 AM
Point 4 allows a user to search and retrieve documents on the network.
by hayksaakian on 6/26/19, 4:14 PM
For example, if you want to know where to eat tonight, instead of searching "restaurants near me" you might ask your friends "where should I eat tonight" and get personalized suggestions.
by weliketocode on 6/26/19, 7:54 PM
If you don’t believe finding information is currently trivial using Google, that’s going to be a tough nut to crack.
What would you use for information retrieval that doesn’t involve indexing or a search engine?
by garypoc on 6/26/19, 7:19 AM
by lowcosthostings on 6/28/19, 12:14 PM
by fooker on 6/26/19, 5:25 PM
by siliconc0w on 6/26/19, 5:03 PM
by tmaly on 6/27/19, 12:17 PM
Once we have really fast 5?G networks, there is a good possibility that some type of distributed mesh type search solution could replace the big players.
by Advaith on 6/26/19, 5:45 PM
You will be able to trust data and sources instantly. There will be no intermediaries and trust will be bootstrapped into each system.
by blackflame7000 on 6/26/19, 6:15 PM
by nobodyandproud on 6/26/19, 1:48 PM
Not a place for entertainment, but where government or business transactions can be safely conducted.
A search engine would be of secondary importance.
by reshie on 6/26/19, 7:09 AM
it sounds like what you really want is a decentralized search engine and anonymous by default as apposed to no search engine.
by paparush on 6/26/19, 3:16 PM
by Papirola on 6/26/19, 5:59 PM
by Isamu on 6/26/19, 2:47 PM
Another original intent: that URLs would not need to be user-visible, and you wouldn't need to type them in.
by truckerbill on 6/26/19, 9:59 AM
by thedevindevops on 6/26/19, 7:10 AM
by ken on 6/26/19, 5:18 PM
by kazinator on 6/26/19, 9:47 PM
A user wants to find a "relevant document".
What is that? What information does the user provide to specify the document?
Why does the user trust the result?
by bitL on 6/26/19, 4:30 PM
by comboy on 6/26/19, 1:35 PM
I'm sorry it's a bit long, TL;DR you need to be explicit about people you trust. Those people do the same an then thanks to the small world effect you can establish your trust to any entity that is already trusted by some people.
No global ranking is the key. How good some information is, is relative and depends on who do you trust (which is basically form of encoding your beliefs). And yes, you can avoid information bubble much better than now but writing more when I'm so late to the thread seems a bit pointless.
by FPurchess on 6/26/19, 7:50 AM
by otabdeveloper4 on 6/26/19, 3:28 PM
Probably not what you had in mind, though. Be careful what you wish for.
by xorand on 6/26/19, 8:00 AM
by robot on 6/26/19, 7:01 PM
by buboard on 6/26/19, 7:48 PM
by ISNIT on 6/26/19, 7:33 AM
by amelius on 6/26/19, 7:55 AM
by ptah on 6/26/19, 1:38 PM
by sys_64738 on 6/26/19, 6:03 PM
by peterwwillis on 6/26/19, 1:27 PM
If you've ever tried to maintain a large corpus of documentation, you realize how incredibly difficult it is to find "information". Even if I know exactly what I want.... where is it? With a directory, if I've "been to" the content before, I can usually remember the path back there... assuming nothing has changed. (The Web changes all the time) Then if you have new content... where does it go in the index? What if it relates to multiple categories of content? An appendix by keyword would get big, fast. And with regular change, indexes become stale quickly.
OTOH, a search engine is often used for documentation. You index it regularly so it's up to date, and to search you put in your terms and it brings up pages. Problem is, it usually works poorly because it's a simple search engine without advanced heuristics or PageRank-like algorithms. So it's often a difficult slog to find documentation (in a large corups), because managing information is hard.
But if what you actually want is just a way to look up domains, you still need to either curate an index, or provide an "app store" of domains (basically a search engine for domain names and network services). You'd still need some curation to weed out spammers/phishers/porn, and it would be difficult to find the "most relevant" result without a PageRank-style ordering based on most linked-to hosts.
What we have today is probably the best technical solution. I think the problem is how it's funded, and who controls it.
by fergie on 6/26/19, 1:12 PM
"1- Finding information is trivial"
The web already consists, for the most part, of marked up text. If speed is not a contraint, then we can already search through the entire web on demand, however, given that we dont want to use 5 years on every search we carry out, what we really need is a SEARCH INDEX.
Given that we want to avoid Big Brother like entities such as Google, Microsoft and Amazon, and also given, although this is certainly debatable, that government should stay out of the business of search, what we need is a DECENTRALISED SEARCH INDEX
To do this you are going to need AT THE VERY LEAST a gigantic reverse index that contains every searchable token (word) on the web. That index should ideally include some kind of scoring so that the very best documents for, say, "banana" come at the top of the list for searches for "banana" (You also need a query pipeline and an indexing pipeline but for the sake of simplicity, lets leave that out for now).
In theory a search index is very shardable. You can easily host an index that is in fact made up of lots of little indexes, so a READABLE DECENTRALISED SEARCH INDEX is feasable with the caveat that relevancy would suffer since relevancy algorithms such as TD-IDF and Page Rank generally rely on an awareness of the whole index and not just an individual shard in order to calculate score.
Therefore a READABLE DECENTRALISED SEARCH INDEX WITH BAD RELEVANCY is certainly doable although it would have Lycos-grade performance circa 1999.
CHALLENGES:
1) Populating the search index with be problematic. Who does it, how they get incentivized/paid, and how they are kept honest is a pretty tricky question.
2) Indexing pipelines are very tricky and require a lot of work to do well. There is a whole industry built around feeding data into search indexes. That said, this is certainly an area that is improving all the time.
3) How the whole business of querying a distributed search index would actually work is an open question. You would need to query many shards, and then do a Map-Reduce operation that glues together the responses. It may be possible to do this on users devices somehow, but that would create a lot of network traffic.
4) All of the nice, fancy schmancy latest Google functionality unrelated to pure text lookup would not be available.
"2- You don't need services indexing billions of pages to find any relevant document"
You need to create some kind of index, but there is a tiny sliver of hope that this could be done in a decentralized way without the need for half a handful of giant corporations. Therefore many entities could be responsible for their own little piece of the index.
by sonescarol on 6/26/19, 4:41 PM
by sonnyblarney on 6/26/19, 4:36 PM
i.e. when you search, you start in a relevant domain instead of Google so Amazon for products, Stack Exchange for CS questions.
Obviously not ideal either.
by diminoten on 6/26/19, 4:02 PM
by wfbarks on 6/26/19, 7:14 AM
by codegladiator on 6/26/19, 6:38 AM
by nojobs on 6/26/19, 12:54 PM
by drenvuk on 6/26/19, 7:14 AM
This is not simple, and your Ask HN reeks of ideology and contempt without so much as an inkling of the technical realities that would have to be overcome for such a thing to happen. That goes for both old and new internet.
/rant