The "content" is bait and as with all bait it is unfit for human consumption. The point of the exercise is to lure the user to click on the ads. You can imagine that when someone gets paid a couple of dollars to write an article about, say, ball bearings (to pick a random topic), yet has absolutely zero knowledge about ball bearings, the content that is being created won't be of high quality.
And there are lots of people and companies that make a decent living out of creating rubbish content.
Search engines have value when users are happy, but since search engine companies also sell ads they have a balancing act to perform: the content produced by content mills produces ad revenue, but at the cost of the search experience. Since content mills have become so numerous and "fake" content is so prolific it is becoming a real problem.
This leads to the question of whether filtering is worth doing. What if we did the opposite and were able to cherry-pick content instead? Here are some thoughts:
- An author (blogger, journalist, writer) has a public key pair. This is used to sign content and thereby affirming authorship of the content. This would provide the writer with a verifiable identity and the mechanism for associating content with its author.
- A public key pair may or may not be used to identify a real person. It can just as easily be used to affirm that content is produced by the same anonymous source. The problem we want to solve is not to verify the real identity of the author.
- A given piece of content can have multiple signatures attached to it.
- When you read a blog posting or an article, you have three choices. You can mark it as good, neutral or bad. The neutral tag means that you'd like to start tracking this author, but you are not making any value judgement at this time.
- Your choices can be stored locally, they can be stored in a third party service or they can be stored with the search engine. These mechanisms have their pro's and cons.
- When you perform searches the search engine can include the verified key id in the search results. This can then be used either by your browser to do local scoring and ranking within the browser, or the search engine can be privy to your preferences and do the scoring and ranking for you. Known bad sources will be omitted, known good sources might get boosted.
- Since it will take you forever to build a significant personal database of "known good" and "known bad" authors, it would be obvious to look at collaborative systems for author ratings. For instance you might want to share ratings with people you trust.
The main objective is to provide you with a simple, non-intrusive mechanism for getting some hints about content quality. It also opens up a wide range of possibilities for collaborative filtering.
I haven't really thought much of this through in much detail, but it is an interesting idea. Aspects of it is not unlike scoring USENET postings in news readers that support scoring and filtering -- a feature that is sorely missed on the web and especially when performing searches.
I haven't really thought much of this through in much detail, but it is an interesting idea. Aspects of it is not unlike scoring USENET postings in news readers that support scoring and filtering -- a feature that is sorely missed on the web and especially when performing searches.
As with all half-baked ideas, there is a likelihood that I have overlooked some fundamental problem or problems.
One thing that is immediately obvious is that the mechanism for signing content, identifying what content on a page is signed and what isn't, has to be made simple and robust. This is a challenge because content is often published in a dynamic web page -- where the contents of the page changes with each request.
There is also the risk of using signed content as bait. Say a page contains a signed article from a "known good" author (as per your preferences) in 2pt fonts near the footer and the rest of the page is pure rubbish. What then? It might be possible to have some browser tools that will look at the surface area of rendered content on the page and warn the user if it looks as if this baiting is happening (perhaps giving the user the option of adding the page to a personal or shared blacklist?). I don't know, but this presents an interesting challenge.
It must also be possible to retrofit this to existing technologies with minimal disruption and complexity. If you have ever had to write code that signs XML you know that this is horribly complicated and brittle. Perhaps simpler mechanisms can be arrived at.
In any case, I would love to see someone actually prototype some of these ideas to see if they can be made to work or if it is completely unfeasible. I like the idea of having a mechanism for recording my personal preferences and influence my personal search results from these preferences. I would also love to have a compact UI element that shows me a summary of what I think of the author(s) of a given page (sort of like the SSL indicator in the address bar).
In any case, I would love to see someone actually prototype some of these ideas to see if they can be made to work or if it is completely unfeasible. I like the idea of having a mechanism for recording my personal preferences and influence my personal search results from these preferences. I would also love to have a compact UI element that shows me a summary of what I think of the author(s) of a given page (sort of like the SSL indicator in the address bar).
If you are a student at a university looking for a research topic, this might be something for you.