The Building Blocks of a Great Website: A Starter Kit

Every web developer has their own style, be it the way they code their CSS to how they write their javascript, but there are some standards. These standards allow us to quickly and easily generate code considered generally beneficial to the client when setting the foundation for a website. These standardized tools have been developed over time since search engines began to categorize the World Wide Web as early as the 1990's. It is unfortunate for clients that they are often overlooked by web developers who either don't understand or don't realize they exist.

Each of the following topics has been covered in detail across the internet in many different locations. Our goal is to bring them together under a single roof to demonstrate our building process when developing a website.

 

www or no www: You Have a Choice

We begin by deciding whether to utilize the "www" within the site URL. There are strong cases from both sides but we will not be covering this in depth, if you would like to read more on this topic see: "WWW or No WWW, that s the SEO Question" for a more explanation.

From Matt Farina: "Back in the early days of the Internet the 'www.' mattered. It was a way of getting you to the right content on the right server. It was a necessarily thing. The early days of the Internet are long gone and the need for this fell off the charts long ago. But, support has remained and now when Google indexes your site it indexes the 'www.' version and the non-'www.' version. The open source content management system Joomla! is a recent example of this hurting a sites ranking. The site www.joomla.org recently took a hit in the rankings while joomla.org stayed high. The reason for this is that they are seen as 2 separate sites and the sites were ranked differently by the Google wizardry."

Even though we won't be making the decision for you, we typically elect to keep the "www" because it is the most commonly recognized URL structure. For example, when visiting http://doc4design.com the user is redirected to http://www.doc4design.com. This is because we want traffic redirected to a single domain and by adding this redirection we ensure that the majority of search engines only see and catalog a single version of our site. If the site is retaining user log information such as login names and passwords having a single version of the site will prevent storing two copies.

Whichever choice you make, try to keep it consistent across all platforms including printed and interactive materials. Setting up your .htaccess file for a redirect: Open the .htaccess file located in your root directory and add the following code below. Be sure to change "doc4design" to your URL. If you do not have a .htaccess file, create one by opening the text editor of your choice (notepad, TextEdit), add your code choice below and save the file as "htaccess.txt". Upload the file to the root directory of your site and change the name to ".htaccess." We change the name after uploading because files prefixed by a "." are invisible.

INCLUDE WWW

RewriteEngine On
RewriteCond %{HTTP_HOST} ^doc4design.com/wp [NC]
RewriteRule ^(.*)$ http://www.doc4design.com/wp/$1 [L,R=301]

REMOVE WWW

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

 

ROBOTS.TXT File

It's safe to assume all search engine spiders crawling a website have an innate need to catalog as much as they can. This includes cataloging any personal files to those eBay images you may have secretly added to the company File Transfer Protocol (FTP) site. If this sounds unnerving, it should. Luckily there is a way to tell the spiders not catalog everything which they do by default. The first file search engine spiders look for when visiting a website is the robots.txt file. This is a plain text file similar to the .htaccess file which can be created and opened in the simplest of programs from notepad to TextEdit. The robots.txt file is simply a list of instructions informing the spiders what they can and cannot do with the files and folders on the server. In the case above it might be a good idea to keep the spiders out of the personal and eBay folders and setting up a robots.txt file is pretty easy. The examples below will give you an idea of some code you might include. If you do not have a robots.txt file, create one by opening the text editor of your choice (notepad, TextEdit), add your code choice below and save the file as "robots.txt". Upload the file to the root directory of your site and you're all set.

CATALOG EVERYTHING

# User-agent: *
Disallow:
#

CATALOG EVERYTHING EXCEPT TWO FOLDERS

#
User-agent: *
Disallow: /personal/
Disallow: /eBay/
#

CATALOG EVERYTHING EXCEPT A SPECIFIC FILE

User-agent *
Disallow: /personal/mypage.html

Even if you would like the entire site to be cataloged by default, it doesn't mean you should skip the robots.txt file. It should still be included to prevent server side logs showing thousands of 404 errors ( the error you see when a page is not found: addthis.com/qwefkujn) At a minimum, install a blank robots.txt file in your root directory folder. For more visit the Wikipedia article on the topic: Wikipedia.

 

Include a Sitemap

Sitemaps provide several benefits which include ease of navigation and better visibility to search engines. They provide the means for informing a search engine that your site exists or that changes have been made that require a second look. If you do not utilize a sitemap your site will eventually be indexed but at a much slower speed. The process of providing a sitemap should be done upon completion of the website with the goal being to include all links and information. If you are using WordPress generating a sitemap.xml and sitemap.xml.gz file is easily accomplished through a single plugin: Google Sitemaps XML WordPress

"This plugin will create a Google sitemaps compliant XML-Sitemap of your WordPress blog. It supports all of the WordPress generated pages as well as custom ones. Every time you edit or create a post, your sitemap is updated and all major search engines that support the sitemap protocol, like ASK.com, Google, MSN Search and Yahoo!, are notified about the update."

If you are not using WordPress, creating a sitemap is still straightforward, visit xml-sitemaps.com/ to generate your sitemap and upload the sitemap.xml file to your root directory. After uploading the file you will need to inform the search engines that you have a sitemap to read:

GOOGLE SITEMAP SUBMISSION:

Google provides a dedicated portal for web masters through their Webmaster Tools. It is necessary to sign up and verify ownership of the website by way of a provided meta tag similar to the example below. Once verified instructions are provided to submit a sitemap.

meta name="verify-v1" content="Unique Identifier"

YAHOO SITEMAP SUBMISSION:

Yahoo, like Google, provides a dedicated portal for web masters through their Yahoo! Site Explorer. As with Google, it is necessary to sign up and verify ownership of the website by way of a provided meta tag similar to the example below. Once verified instructions are provided to submit a sitemap.

meta name="y_key" content="Unique Identifier"

MSN SEARCH SITEMAP SUBMISSION:

MSN requests you place the location of your sitemap within your robots.txt file.

User-agent: *
Sitemap: http://www.yourdomain.com/sitemap.xml
Disallow:

BING SITEMAP SUBMISSION:

Bing makes it easy using a similar URL submission as Ask.com. Be sure to change "www.doc4design.com" to your own URL then visit the link for submission.

http://www.bing.com/webmaster/ping.aspx?sitemap=www.doc4design.com/sitemap.xml

ASK.COM SITEMAP SUBMISSION:

Change the following URL to reflect your own "www.doc4design.com" then visit the link for submission:

http://submissions.ask.com/ping?sitemap=http%3A//www.doc4design.com/sitemap.xml

 

Prevent Access to Folders

Once we begin to upload folders and files it is a good idea to prevent unscrupulous individuals from snooping around in them. Such folders include https folder, images folder or maybe a downloads folder. Access to these unprotected folders is as easy as typing the absolute URL to their location. For example http://www.doc4design.com/images. Assuming this folder was accessible, you would see a list of all image files within the folder and could then download them to your computer. Preventing access to these files also helps close up a few backdoors into the site, streamlining it.

Preventing access is accomplished by creating an index.html file which can be blank or contain information informing the user they shouldn't be there. Once the file is ready, upload it to the folder you would like to prevent access to.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
 <head>
  <title>403 Forbidden</title>
 </head>
 <body>
  <h1>Forbidden</h1>
  <p>You don't have permission to access /wp-content/plugins/ on this server.</p>
 </body>
</html>

 

Select a !DOCTYPE

The doctype declaration should be the first piece of code on any HTML document even before the tag appears. A doctype declaration is a set of instructions given to the web browser declaring the markup language used it specifies rules instructing the browser on how to render the page correctly. Much has been written about the doctype declaration and a good source of information can be found at A List Apart in their article Fix Your Site With the Right DOCTYPE! For an extensive list of doctypes available visit the W3C website: Recommended DTDs to use in your Web document. Not including a doctype can throw a browser into Quirks Mode.Wikipedia explains this best in their Quirks Mode article:

"To maintain compatibility with the greatest possible number of web pages, modern web browsers are generally developed with multiple rendering modes: in 'standards mode' pages are rendered according to the HTML and CSS specifications, while in 'quirks mode' attempts are made to emulate the behavior of older browsers. Some browsers (those based on Mozilla's Gecko rendering engine, or Internet Explorer 8 in strict mode, for example) also use an 'almost standards' mode which attempts to compromise between the two, implementing one quirk for table cell sizing while otherwise conforming to the specifications."

We typically include the following doctype within our pages:

!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

 

Specify a Language

In addition to the doctype, we need to specify the language of the website. Obviously, our parent language is English "en-US" but for a full list of codes visit tlt.its.psu.edu. If using a specialized language, it also important to specify the text direction left to right ( LTR ) or ( RTL ).

html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"

 

Meta Tags

AUTHOR:

The author tag provides search engines with information about the author of the document. Typical usage may include the name and email address of the webmaster:

meta name="author" content="Webmaster: webmaster@email.com"

COPYRIGHT:

The copyright tag provides search engines with information regarding copyright, trademark, patent or other information pertaining to intellectual property. Typical usage:

meta name="copyright" content="copyright 2005-2009"

AUDIENCE:

The audience tag provides search engines and software with information for parental control. Typical usage:

meta name="audience" content="All"

REVISIT-AFTER:

The revisit-after tag provides search engines information on when they should revisit the site. Typical usage:

meta name="revisit-after" content="3 days"

DESCRIPTION:

The description tag provides search engines with the brief site description displayed within search engine results. It is highly recommended this be included. Typical usage:

meta name="description" content="<span>Advertising and Marketing agency located in California</span>"

KEYWORDS:

It is not uncommon to think of meta tags as obsolete considering Google has officially denounced their usefulness. This statement is generally referring to a website's meta "keywords". Keywords are terms used by search engines to assist in determining the site's content.

From Wikipedia "The keywords attribute was popularized by search engines such as Infoseek and AltaVista in 1995, and its popularity quickly grew until it became one of the most commonly used meta elements. By late 1997, however, search engine providers realized that information stored in meta elements, especially the keywords attribute, was often unreliable and misleading, and at worst, used to draw users into spam sites. (Unscrupulous web masters could easily place false keywords into their meta elements in order to draw people to their site.) Search engines began dropping support for metadata provided by the meta element in 1998, and by the early 2000s, most search engines had veered completely away from reliance on meta elements. In July 2002, AltaVista, one of the last major search engines to still offer support, finally stopped considering them. No consensus exists whether or not the keywords attribute has any effect on ranking at any of the major search engines today. It is speculated that it does, if the keywords used in the meta can also be found in the page copy itself. With respect to Google, thirty-seven leaders in search engine optimization concluded in April 2007 that the relevance of having your keywords in the meta-attribute keywords is little to none and in September 2009 Matt Cutts of Google announced that they are no longer taking keywords into account whatsoever. However, both these articles suggest that Yahoo! still makes use of the keywords meta tag in some of its rankings. Yahoo! itself claims support for the keywords meta tag in conjunction with other factors for improving search rankings."

meta name="keywords" content="advertising, design, articles"

 

We've Only Just Begun

The Carpenters weren't kidding. This article covers the initial building blocks for starting out strong when building a site but it is important to remember that good design, strong coding skills, social marketing and many other factors come into play to create a well-rounded website. We hope this article has helped at least a few of you out there and just to reiterate, we have provided a quick setup sheet on how to implement the doctype, language, sitemap verification and meta tags:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head>

<!-- Google Verify -->
<meta name="verify-v1" content="Unique Identifier" />

<!-- Yahoo Verify -->
<meta name="y_key" content="Unique Identifier" />

<meta name="author" content="Webmaster: webmaster@email.com" />
<meta name="copyright" content="copyright 2005 - 2009">
<meta name="audience" content="all">
<meta name="revisit-after" content="3 days" />
<meta name="description" content="<span>Advertising and Marketing agency located in California</span>" />
<meta name="keywords" content="<span>advertising, design, articles</span>" />

</head>
Dale Crum

Dale Crum

Owner / Creative Director at Doc4 Design