A Weblog entry posted on Tuesday, September 04 2007
The good, bad, and ugly of the social web
Users try and find things on the web. Search engines assist those users in locating exactly what they are looking for. What happens when content is duplicated all over the web? Who is the proper owner and where is the proper source? The web has seen a rise in technologies such as RSS, web services such as SOAP and REST that make it easy to share your content with other websites via an API, and Microformats allow you to markup your documents to give a more semantic meaning to your content. All of these tools allow you to share content, and to consume shared content. All of this is good, so what is the problem?
The good
The web today is a very social web. It is no longer just the hyperlink that manages how you can navigate. You can have your web content pulled into desktop applications and widgets. You can access the content from your mobile phone or PDA. All of these things are good things.
I can post my events to upcoming.org and then use their API to pull in my recent events into my personal website. I can post my pictures from vacation to flickr.com and then use their API to pull in my recent photos into my personal website. I can post my latest book reviews to revish.com and then use their API to access my reviews and other related information. I can bookmark websites and add them to Ma.gnolia.com and then use their API to access my bookmarks. Add to this the array of social networking sites such as facebook.com, twitter.com, digg.com, and pownce.com, and you have a very twisted web of sharing information. All of this is good. It allows us to share our content with very little hassle and provides maximum exposure.
The bad
The very twisted web of sharing information is bound to get tangled rather quickly. The bad part comes when search engines or other users try to find the originating source of documents, photos, and other information. Let's look at an example:
I could post a review to my website. All of my reviews are marked up with Microformats. Someone else could easily syndicate these reviews via my RSS feed to their website. This would include the review in its entirety, including the markup I use for Microformats. So, now my review is posted in its entirety on someone else's website. What if this is a site that has a higher search engine ranking than my personal site? Say a user searches for the review and finds the other site before they find my site. Without proper citation, how are they to know that the author of the review was actually me and not the site owner who is syndicating the content?
Sites like virb.com and facebook.com allow you to import your RSS feeds into your account. They pull in your posts in their entirety, and even allow users to comment on your posts from within their website (completely de-coupling the source and the original conversation). To me, this is bad. Yes, you gave permission to syndicate your content - but did you ask for the conversation layer to take place inside of their application? While adding the conversation layer is not inherently bad, it makes the task of keeping unique content in one place very difficult. In some cases it makes the task of locating the original source a chore for a user locating the content. This is bad.
The ugly
All of this sharing and syndicating takes into account that we are dealing with professional and ethical developers. However, we all know that the web has its fair share of unethical developers as well. How do you stop them from maliciously using your information? What is to stop someone from using Microformats to mark up their content with my information? What is to stop someone from creating an hCard with my details, pretending to be me? Further yet, how does a search engine or user know how to distinguish between the two? This is the ugly.
The challenge
When I first discovered revish.com I thought it was an incredible resource. I still feel that way. One of the core principles of revish is to eliminate duplication. Therefore, if I have already posted the review to my personal site, then I shouldn't re-post my review to revish. But, revish has a different audience and community that might be able to benefit from reviews of specific books, so how can this be managed? I can't post my reviews to revish and then syndicate to my site, because then I would have the hassle of re-formatting the markup to fit my site and Microformats - all for my content. What happens if revish just imports reviews from my website and doesn't change any of the formatting? Now they have imported all of my Microformat information - well, almost. I use the include pattern to cite me as the reviewer. This information does not travel with my RSS feeds, and therefore wouldn't be posted to revish as the complete source. This is a challenge.
The solution
I am not sure there are solutions to many of the problems explained above. There are more problems beyond what has been discussed here, this is only scratching the surface. Here are a few steps I think we can take to avoid tangling the web:
- Cite your sources. Take time to hunt down the originating author and cite the source properly. Know the markup available to you to properly cite the source to better help search engines make sense of the data.
- If you are syndicating content from another source, be sure to strip out unnecessary markup that may be specific to the originating website.
- If you are syndicating content from another source, be sure to syndicate directly from the source - not a syndication from a syndication. This is where the web can easily become tangled.
- Be conscious of who is using your content. In an ideal world everyone would use your content for good, be sure to try and keep tabs on where your content is being used and how it is being used.



![[Graphic: Book covers]](/site_files/640/books.png)
You can opt to turn off comments on imported posts in Virb. Not sure about Facebook though. I think Virb handles it well.
@Jina
I am not attacking Virb or any of the social networking sites for that matter. I like in facebook that is has 'link to original post' at the very top, where on Virb it is on the bottom. At least they both have the link listed.
But, it is much wider than just these two sites. There are many others. And it isn't just syndicating a standard blog post.
Looking from a search engine perspective - how are they supposed to know who the real author is? How does Technorati know that the hCard on my site should be the definitive hCard and I am the real 'Nate Klaiber'? Without some sort of authentication (microformats) or citation (all other documents) - it gets very hard to really distinguish the original author.