The search engine magic isn't just about caching pages. It's also extremely expensive / complex to:
maintain an index of all the websites in the world. This is an extremely high cost
refresh that index in almost real time. How long will your self hosted crawler take to find new content for every website in the world?
there's also the algorithm for weighing results. The order of results and their relevance is not easy at all. How many times a word appears on a page is a terrible metric.
A hosted search is also a lot more environmentally friendly - that gigantic search index and all the energy poured into the work is something that can be shared by everyone. If everyone did that themselves at home, you spend that same amount of energy for every single household.
"Almost real time" is not what I'd call necessary.
Weighting results - I'd expect user feedback (good result, bad result, combined with keywords from the request) would be good enough. Similar to ed2k for files' reputation.
The index is going to be big, yes. But if we want a p2p system with split storage and computation, something between Freenet and Ceph, may be doable.
A hosted search is also a lot more environmentally friendly - that gigantic search index and all the energy poured into the work is something that can be shared by everyone. If everyone did that themselves at home, you spend that same amount of energy for every single household.
With some kind of such a p2p system I can imagine the overhead to be like 10 or maybe 100 times Google. But not what you said.
The search engine magic isn't just about caching pages. It's also extremely expensive / complex to:
A hosted search is also a lot more environmentally friendly - that gigantic search index and all the energy poured into the work is something that can be shared by everyone. If everyone did that themselves at home, you spend that same amount of energy for every single household.
"Almost real time" is not what I'd call necessary.
Weighting results - I'd expect user feedback (good result, bad result, combined with keywords from the request) would be good enough. Similar to ed2k for files' reputation.
The index is going to be big, yes. But if we want a p2p system with split storage and computation, something between Freenet and Ceph, may be doable.
With some kind of such a p2p system I can imagine the overhead to be like 10 or maybe 100 times Google. But not what you said.