Sunday, March 18, 2012

The real problem with search engine optimization... or the web?

This evening, I read a news that Google is Planning to Penalize Overly Optimized Sites, and I found it through Slashdot. What really interests me is that the way I found the news is an illustration of a search engine optimization problem itself. It turns out the real "beef" of the news behind the links is 5 degrees to the secondary source and 2 more degrees to the primary source. Each time the story is linked, another person either adds a little bit more insight or just summarizes the linked successor slightly differently without much value added.

Here is the link structure of the story.

The distinction between primary source and secondary source is made that secondary source creates new insight from the primary source, but the primary source is just a recording of facts without any insight. In this case, the primary source included an audio recording of a panel discussion, and the secondary source highlighted pieces of it with its own interpretation.

Note that CNET also posts the link to the primary source on sxsw.com, but without context that it is the primary source.

I think this particular example of the search engine optimization story illustrates the problem very well. In primary school, you were taught that every reference is either a primary source or a secondary source, but the reality is always more complicated. Some secondary sources are primary sources in some aspect, and tertiary sources can still add value (or you can argue that the third category is also a secondary source). The web makes it much more easier to link, and now people also have financial incentive---the more they get cited, the more ads they can show, and that gives them a profit. They can do that without generating any original content, or they can add a little value by adding their own interpretation. The ads revenue they generate, on the other hand, depends on how much effort they put in promoting their post, so that other people will want to link to them.

Let's put the financial motive aside and assume that people will always link without ads revenue. If a person can get away showing tons of ads but actually adds tons of value to some topic, then why penalize him?

The issue is that getting to the "beef" of the story is still a graph search problem. The search engine (e.g. Google Bot) supposedly has all the link graph information, but it does not understand the distinction between primary and secondary sources, and it's up to the reader to investigate by spending time and effort. With the amount of information exploding each day, we really need a search engine that will do better. If it's too much to ask a search engine to understand the difference between primary and secondary sources, I think it is at least plausible to have a tool to highlight the contribution of each page with regard to a particular aspect or topic---like a "diff" tool with fuzzy syntactic matching.
Post a Comment