Here is the link structure of the story.
- slashdot.org Google is Planning to Penalize Overly Optimized Sites by tekgoblin to promite their own website, which links to:
- tekgoblin.com Google is Planning to Penalize Overly Optimized Sites by SANITY, which links to:
- news.cnet.com Google plans to penalize 'overly optimized' sites by Edward Moyer, which links to:
- searchengineland.com Too Much SEO? Google's Working On An "Over-Optimization" Penalty For That by Berry Schwartz, which is a self-plagiarism that links to:
- seroundtable.com Cutts: Google To Target Overly SEO'ed Sites Within Weeks by Berry Schwartz which is a secondary source of the relevant information, citing its primary source from:
- seroundtable.com Audio From SXSW Google's Cutts, Bing's Forrester & SEL's Sullivan by the same author, which cites its primary source from:
- sxsw.com Dear Google & Bing: Help Me Rank Better! which contains the primary source audio clip with statement from Matt Cutts.
The distinction between primary source and secondary source is made that secondary source creates new insight from the primary source, but the primary source is just a recording of facts without any insight. In this case, the primary source included an audio recording of a panel discussion, and the secondary source highlighted pieces of it with its own interpretation.
Note that CNET also posts the link to the primary source on sxsw.com, but without context that it is the primary source.
I think this particular example of the search engine optimization story illustrates the problem very well. In primary school, you were taught that every reference is either a primary source or a secondary source, but the reality is always more complicated. Some secondary sources are primary sources in some aspect, and tertiary sources can still add value (or you can argue that the third category is also a secondary source). The web makes it much more easier to link, and now people also have financial incentive---the more they get cited, the more ads they can show, and that gives them a profit. They can do that without generating any original content, or they can add a little value by adding their own interpretation. The ads revenue they generate, on the other hand, depends on how much effort they put in promoting their post, so that other people will want to link to them.
Let's put the financial motive aside and assume that people will always link without ads revenue. If a person can get away showing tons of ads but actually adds tons of value to some topic, then why penalize him?
The issue is that getting to the "beef" of the story is still a graph search problem. The search engine (e.g. Google Bot) supposedly has all the link graph information, but it does not understand the distinction between primary and secondary sources, and it's up to the reader to investigate by spending time and effort. With the amount of information exploding each day, we really need a search engine that will do better. If it's too much to ask a search engine to understand the difference between primary and secondary sources, I think it is at least plausible to have a tool to highlight the contribution of each page with regard to a particular aspect or topic---like a "diff" tool with fuzzy syntactic matching.