A Weblog entry posted on Tuesday, July 17 2007

The hard work is almost over. You have launched your brand new website. Now what? How are your potential customers going to find you? Over the years, search engines have become smarter at finding the relevant content through all of the other noise. Here are some tips to start optimizing your website for maximum visibility.

The Almighty Index

The index is where the search engines store all of the relevant sites for any given keyword. This might seem like a no-brainer, but there truth is you have a little bit of say in this process through a file called robots.txt. This file allows you to give the search engines specific directions related to your website. You can tell them which pages they are free to roam, and explicitly let them know where they are not permitted. This step is very important to keeping a clean index with search engines as it allows you to block out development, administrative, and other sensitive areas. Your journey is beginning, you just gave the search engine the permission it needs to scour your website. The search engine will now go to work searching through your content and plots out the relevant information and stores it in the index.

The Almighty PageRank

Search Engine Marketers love this word, PageRank. This is the value that Google will assign to any given page in your site. Unfortunately, the process of PageRank is not as easy to explain as that of your robots file. There are many factors that play into your site's PageRank. Here is my brief rundown a few things you can do to achieve a solid PageRank:

  • Properly formatted title tags

    The title tag is what is seen at the top of your browser window, and what search engines will use in their results page. Don't just stuff keywords into this area, take time to plan and think these through. Think about the common uses of the title tag. When someone bookmarks your page in their browser, this is the default storage title. When customers come across your page in search engines, this is the text that will be relayed to them in the results page. Don't let this one slip through the cracks, take your time to develop meaningful titles that will correspond with the rest of the content on your page.

  • Content is king

    This is even more true with search engines. Search engines rely on algorithms to find relevance in your content. Having quality content should be a top priority. Remember when you are writing to keep things simple. There is a tendency to use catchy titles or slang. Though this isn't major, it would be beneficial to stick to relevant keywords. For instance, I could have titled this post "My site is finished, now what?" What relevant information is in that title? Mind you, this isn't just for search engines, but for humans as well. You want to write for humans first.

  • Use keywords wisely

    You need to find a good balance with your keywords. Too much and you will look like spam. Too little keywords, or use of slang, and it is tough to gauge the relevancy. Stuffing in keywords that aren't relevant to your content will hurt you in the long run. Understand your keywords and understand when and where to use them.

  • Give your content meaning

    This is more on the end of a developer who is creating your pages. Using semantic HTML will give your content rich meaning. This allows you to give proper weight to different sections. Text in headings will weigh more than text in a paragraph. Links and the text contained within will have great meaning as you are letting the search engines know the relationship to the other pages. Avoid text such as click here for a link without giving it context. Be sure to use appropriate title and alt attributes. These attributes allow you to give proper weight to your links and your images, respectively. Having good, clean, quality markup allows the search engine to make better sense of your content and it's relationship to the rest of your site and other sites (Remember that relevancy stuff I keep referring to?).

  • Give the search engine a map

    You have used the robots file to tell the search engines where it can and cannot go, now lets help them out a little more by giving them a map. This map will let you help qualify the content on your website. The good news is that search engines have agreed on a format for your sitemap file, and they have given you some flexibility. Using this file you can specify the location of your file, the date the file was last modified, the frequency that it changes, and even give it an internal priority. These things don't directly affect what you will see on search engine results pages, but they help the search engine give context to your content.

  • Optimize your links

    Links are the most powerful thing on the Internet. It is how we get from one page to another, and how we can come to find content. Be sure to give meaning to your outbound links. Be sure to optimize your links and the keyword text contained within. The same is true when you are linking to external websites. Be sure to give proper keyword weight for the external site as well.

These are just a few of the things you can start to do to help build a healthy PageRank. There is no perfect solution to achieving a higher PageRank. However, if you fall into the trap of using tricky techniques, this will only hurt you in the long run. Be smart and write solid content.

The Almighty Spider

The way search engines find your content is through the use of a spider. It is called a spider because it crawls through your page searching for content. It is important to understand that a spider does have some limitations and how you can build your content to avoid some of the common pitfalls. A few of the pitfalls are:

  • Spiders cannot see inside of images or Flash movies

    Content that is stored inside of these will not be discovered by the spider. Be sure to embed your Flash properly, and make proper use of the alt attribute with your images. Options such as swfObject allow you to embed your Flash unobtrusively and provide alternative text to those who don't have flash available. Sherwood points out that in a recent Q&A with Dan Crow of Google he states that Dan pulled no punches on SWFObject: he characterized it as ‘dangerous’

  • Spiders do not parse JavaScript

    JavaScript is often used to enhance a user experience. JavaScript has a tainted past with poor developers using the wrong tool for the job. One of the ways it has previously been used is for page re-direction. Using JavaScript to handle your page re-direction is like shutting the front door on the spider. Be sure to use the right tools for the job. JavaScript, when used properly, can enhance a customer's experience. When used poorly, it will only frustrate the user and eliminate visibility of important content to the spider.

  • Spiders are still having trouble with AJAX

    Simply put, AJAX allows an otherwise normal web-page to respond like a desktop application without having to re-fresh the page for information from the server. All of the work is done behind the scenes. This is great for the end user, but since the J inside of AJAX stands for JavaScript, spiders are still having troubles processing the content returned inside of the requests. With so many sites implementing AJAX in different ways, the spider simply doesn't understand the logic or how to process the requests accordingly, thereby effectively hiding possible quality content. Be sure to implement AJAX wisely and let the spider have access to the content at the core.

All of this goes back to Giving your content meaning. Making sure that all of your content is accessible to all, then applying your enhancements progressively.

Conclusion

Search engine optimization is a tricky subject. This article has only scratched the surface on a broad topic. Through this article you have seen how to manage your index by letting the search engine spider know where it can go, how it can make sense of all of your content, and how to make sure your content is accessible. So, your site is launched - now go and make it visible to your customers!

Resources

Be sure to keep tuning in as we discuss more ways that you can optimize your site for search engines.

Add your comment

Comments

Awesome write-up. :) I have this image now in my head of a large, glowing spider, due to the "Almighty Spider" header. ;) But on a serious note, this was good stuff. I didn't know that Search Engines didn't parse javascript. Good to know.
Posted by Jina Bolton on Jul 17
Wow!
Posted by Nathan on Jul 17

@Jina

The issue with Javascript is that it requires the Spider to have a complete understanding of the available methods to know what to do with them. The browsers know what to do with them because they know how to parse Javascript. Spiders are crawling on the backend of your website sifting through your content.

I haven't found a case where spiders understand Javascript. I am sure there have been some attempts, but Javascript can be written in so many different ways - that it would also require the spider to know the little nuances.

An old black hat technique people would do is use Javascript to sniff the user agent and serve up different content to a search engine (effectively dubbed cloaking).

So, in short - it would require a spider to carry around a backback with a full javascript parser to even make sense of what you have on your pages inside of your javascript. This would effectively take much longer to index your website, as the spider would then have to parse and make sense of the Javascript on your pages.

I am sure there might be different schools of thought on this, but my research has taught me to not hide content in Javascript for this exact reason.

Posted by Nate Klaiber on Jul 17
Great article, Nate! A lot of useful info here. Now if you'll excuse me, I have to go back and rework a few of my sites for the Almighty Spider.
Posted by Brad Dielman on Jul 17

Good article Nate! However one thing you didn't mention was how important it is to write up some 301 redirects pointing your old content to your new, if you've switched to a new CMS and/or URL structure. Or any PageRank your old site had will be lost.

And speaking of 301s, another thing to think about is canonicalization. Search engines can see www.yourdomain.com, domain.com/, www.domain.com/index.html, etc as different locations... You can change this in your .htaccess file with a 301 redirect (and in Google Webmaster Central, but obviously that's only for Google). You should also link to the URL you choose for your website (both internally and externally) consistently so as to not dilute the strength of your site.

Here's what you should add to your .htaccess to make sure it always goes to the www. URL of your site:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^yourdomain\.com
RewriteRule ^/(.*) http://www.yourdomain.com/$1 [R=301,L]
Posted by Brendan Cullen on Jul 18

This is the first time I have come across your website and it's a great article - and a great design too.

I've been either working in and studying the web design industry for the past 8 years, but it is only now that I am becoming interested in SEO and getting client's websites into Google and other search engines.

The thing that people must realise is that it takes time. The "spiders" can only visit so many sites in a day, so you must be patient when it comes to seeing how your hard work has paid off. Also, I'd recommend doing small tweaks rather than one big one as it allows you to easily gauge what effect it has had.

Nate, is there any reason that you didn't mention the META "Description" tag? This is still used by so many search engines, even to the extent of it being displayed on the search engine results page (SERP).

Posted by Oliver Coningham - Website Design in Yeovil on Jul 18
@Brad

You must appreciate the Almighty Spider and the power it holds.

@Brendan

You are exactly right. I think I might go back and squeeze that information in as well. Preserving your link structure is one of the best things you can do for your visitors and spiders. I think this goes in the planning stage. With a new site you want to plan out your URLs accordingly, and with an old site you want to keep your index and re-direct accordingly. The previous site I worked on was a little tougher to handle all re-directs, as they relied heavily on query strings. So I had to use RewriteCond (QUERY_STRING) in conjunction with RewriteRules.

I had this discussion yesterday at Standardzilla. Not only the canonization of www or non-www, but managing your trailing slashes. If these are neglected, then it will be reflected in your analytics as well as your index. So, if you have a page like www.domain.com/blog/ - ensure that you either keep or drop the trailing slash. If you just created this page without thinking, then it could be translated into four different pages (one with the www, one without the www, one as a page without the trailing slash, one as a directory with the trailing slash). Having one point of entry will help keep your index and analytics neat and tidy.

Also, thanks for pointing out the preference selection in Google Webmaster Central. Like you said, it is only Google - but that is one of the bigger search engines to build for. Doing everything else that you mentioned in your comment will help make sure the other search engines know your preference as well.

@Oliver

Thanks for the kind words!

You hit another very important aspect. After you plant the seed, it takes time to grow. Your index will not magically change over night. I guess this is where the fun comes in. As the developer you can watch your index grow and study your analytics (a topic for another day). The nature of the web allows you to easily shift gears if you see something isn't working as you planned. Those small tweaks can help immensely in the long run.

In general, the meta keywords and description play a small role. Over the years, spammers have abused them by stuffing in non-relevant keywords to try and up their rankings in different areas. So, while it is important - the weight it has has gone down over the years. However, I will refine this information and append it where necessary in the article.

Personally, I still use them. I think that if you use them with care that they can be mildly effective. Use them sitewide and make sure they are unique. Each page should have a unique description and keywords, relevant to that specific page. Don't try and stuff in keywords and use them sitewide, use them wisely. You don't want a user coming to a contact form, and the keywords and description to be a bunch of stuff about your homepage. Make them relevant and they can work for you. Also, don't go overboard with the keywords.

You are correct, the meta tag is still in use by search engines. Here is an example from Barbour Publishing's index. You will notice that each description is unique to the book detail page, specifically.

Posted by Nate Klaiber on Jul 18

Nate - Nice write-up

The great thing about some of this SEO stuff is that SEO and accessibility go hand-in-hand. If the Google Spider can't get to parts of your site, then people with JavaScript or Flash turned off won't be able to see it either. It just goes to show how important it is to think these things through, because doing so helps you out in many different ways.

@Brenden - Good call about the 301s. Changing the URL structure of a website is always a pain!

Posted by Dan Ott on Jul 18

Great discussion! There are a lot of things that haven't been thought through as far as SEO is concerned for our website. I'm making a list based on the suggestions in this article to see what our hosting company can provide to optimize it.

I really appreciate articles like this...spoken in a voice that makes sense, even for those who don't eat, sleep and breathe Site Engine Optimization.

Posted by Bridget Stewart on Jul 18

Nice article, Nate. I have a question...the answer to which I've always assumed. AJAX requests for content that's already on the page, for example in a hidden div, are indexable by engines and are seen by spiders, correct? The issue with AJAX as you point out has to do with requests that are pulling in content from external sources/files, correct?

Posted by hcabarcas on Jul 22

@hcabarcas

There are several things to note here. You are looking at two different methods:

  • DOM Scripting

    Using this method allows you to construct elements on the fly. You could create a whole page if you desired. This is the type of thing the lightbox script uses, where it builds the page on the fly - or it immediately assigns a div to be hidden and then displayed when called upon by a specific action. Using this, the spiders can effectively crawl your content. One point to note is that I would make sure you initially hide the div with JavaScript in case a user doesn't have JavaScript enabled. Doing it directly in the code would render your content hidden with no way to display it.

  • AJAX

    The A in AJAX stands for asynchronous. This means that a request will always be made in the background. The content is only fetched whenever it is requested by the events you choose. The content is not on the page in a hidden div anywhere, but can be written to a div - or you could use a mixture of DOM Scripting and AJAX and build it's container on the fly. Regardless, AJAX needs to send a request in the background to fetch the content. This is where search engines get hung up. These requests are still counted as hits on your internal webserver, because a request is being made for the content - but search engines, with their inability to follow the JavaScript commands, will not be made aware of the page being updated, nor do they see content that is appended to the DOM via JavaScript. These are just a few pitfalls, some of which I am still researching.

Hope that helps!

Posted by Nate Klaiber on Jul 23

I heard many stories,
but yours stands alone ... Best technically.

Posted by vasundhar on Jul 24

I am sure there might be different schools of thought on this, but my research has taught me to not hide content in Javascript for this exact reason.

Posted by iddaa on Jul 24

I found that the redirect code provided by Brad Dielman above did NOT work.

I think the exclamation mark (which means 'NOT') in the second line should either be omitted or followed by www\.

This snippet below does work (on my Apache2 server on Linux) ie adds www to all requests that do not have it
RewriteEngine on
RewriteCond %{HTTP_HOST} ^bristol-online\.com [NC]
RewriteRule ^(.*) http://www.bristol-online.com/$1 [R=301,L]

Posted by Gary Prosser on Jul 27

I attended a session with a Google engineer a couple weeks ago, where he characterized swfobject as a "dangerous" technique for optimizing Flash for search engines. He said that he could not guarantee that swfobject users would not be penalized, even if their implementation was totally above-board:

http://www.searchmatters.net/2007/07/16/google-flash-fixes-can-be-%e2%80%9cdangerous%e2%80%9d/
Posted by sherwood on Aug 02

Useful article to understand the concept "how search search know a website exists." An another useful article that describe the process and steps of search engine crawling when follows a link at http://www.marshalseo.com/articles/how-search-engine-watch-your-website.html

Posted by Suresh Chowhan on Aug 10

Nate,

Don't forget the mighty and must have custom error pages. Not only for spiders but humans as well. No one wants to come from a search engine and see page cannot be found. That is where the custom error page comes into play see here for example.

http://www.unleashedideas.com/nate

See the page above called nate does not exist, so instead of losing a visitor they can still navigate around my site.

Posted by Brandon Livengood on Aug 11

Nate: Fabulous article! I've just begun dipping my toe into the pool of SEO and I was finding it a bit chilly. I'm feeling much more confident now and I'm ready to learn more. I'm keeping your article for reference and, like Bridget, I've started a task list of things to do for my company's site.

@brendan I was always dimly aware of the different HTML responses, but never really looked at them seriously. The powers-that-be want to have pages that come and go, and this will give me some much-needed ammunition against that idea. I knew it was bad, but was having trouble coming up with real reasons why. I'll also be using that mod_rewrite rule.

Posted by Dana on Nov 19

Great article, Nate A lot of useful info here. Now if you'll excuse me, I have to go back and rework a few of my sites for the Almighty Spider..

Posted by Almanca tercüman on Jan 16

Comments are closed.

Contact Us

Let’s talk! We can meet in person, over the phone, or just exchange emails to discuss more details about your project.

Get in touch

[Graphic: Book covers]Nate Klaiber’s Book Reviews

Check out Clear Function’s very own marketing director’s website (nateklaiber.com) for great book reviews on website design and development. Subjects range from HTML/CSS, Ruby on Rails, PHP, AJAX, Javascript, design principles and more.

Clear Function Newsletter

Clear Function Updates

Keep up-to-date on new product releases and news from Clear Function. to subscribe: