Every web developer has their own style, be it the way they code their CSS to how they write their javascript, but there are some standards they generally all follow. These standards allow us to quickly and easily generate code considered beneficial to the client when creating the foundation for a website. These standardized tools have been developed over time since search engines began to categorize the World Wide Web as early as the 1990s. It is unfortunate that they are often overlooked by web developers who either don't understand or don't realize they exist.
We begin by deciding whether to utilize the 'www' with the site URL. There are strong cases from both sides but we will not be covering this in-depth, if you would like to read more on this topic take a look at this great article "To www or Not to www — That is the Question" for an overview.
From Matt Farina: "Back in the early days of the Internet the 'www.' mattered. It was a way of getting you to the right content on the right server. It was a necessary thing. The early days of the Internet are long gone and the need for this fell off the charts long ago. But, support has remained, and now when Google indexes your site it indexes the 'www' version and the non-'www' version. The open-source content management system Joomla! is an example of this hurting a site's ranking. The site www.joomla.org recently took a hit in the rankings while joomla.org stayed high. The reason for this is that they are seen as two separate sites and the sites were ranked differently by the Google wizardry."
Even though we won't be making the decision for you, we typically elect to remove the "www" because from our perspective, it's an outdated URL structure as the majority of web users recognize a URL from the '.com, .net, etc' domain extensions. For example, when visiting https://www.doc4design.com the user is redirected to https://doc4design.com. This is because we want traffic redirected to a single domain, and by adding this redirection we ensure that the majority of search engines only see and catalog a single version of the site. If the site is retaining user log information such as login names and passwords having a single version of the site will prevent storing two copies.
Whichever choice you make, try to keep it consistent across all platforms including printed and interactive materials.
Open the .htaccess file located in your root directory and add the following code. Be sure to change 'doc4design' to your URL. If you do not have an .htaccess file, create one by opening a text editor of your choice (notepad, TextEdit), add the code you prefer from below, and save the file as "htaccess.txt". Upload the file to the root directory of your site and then change the name to ".htaccess." We change the name after uploading because files prefixed by a "." are invisible.
RewriteEngine On
RewriteCond %{HTTP_HOST} ^doc4design.com/wp [NC]
RewriteRule ^(.*)$ http://www.doc4design.com/wp/$1 [L,R=301]
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
It's safe to assume all search engine spiders crawling a website have an innate need to catalog as much as they can. This includes cataloging any personal files to those eBay images you may have secretly added to the company File Transfer Protocol (FTP) site. If this sounds unnerving, it should. Luckily there is a way to tell the spiders not to catalog everything by default. The first file search engine spiders look for when visiting a website is the robots.txt file. This is a plain text file similar to the .htaccess file which can be created and opened in the simplest of programs from notepad to TextEdit. The robots.txt file is simply a list of instructions informing the spiders what they can and cannot do with the files and folders on the server. In the case above it might be a good idea to keep the spiders out of your personal folders and setting up a robots.txt file is pretty easy.
The examples below will give you an idea of some code you might include. If you do not have a robots.txt file, create one by opening the text editor of your choice (notepad, TextEdit), add your code choice below and save the file as "robots.txt". Upload the file to the root directory of your site and you're all set. Note that you will want to change 'personal' or 'eBay' to the names of your folders. These are only examples.
# User-agent: *
Disallow:
#
#
User-agent: *
Disallow: /personal/
Disallow: /eBay/
#
User-agent *
Disallow: /personal/mypage.html
Even if you would like the entire site to be cataloged by default, it doesn't mean you should skip the robots.txt file. It should still be included to prevent server side logs showing thousands of 404 errors ( the error you see when a page is not found, for example; addthis.com/qwefkujn). At a minimum, install a blank robots.txt file in your root directory folder. For more, visit the Wikipedia article on the topic: Wikipedia.
Sitemaps provide several benefits which include ease of navigation and better visibility to search engines. They provide the means for informing a search engine that your site exists or that changes have been made that require a second look. If you do not utilize a sitemap your site will eventually be indexed but at a much slower speed. The process of providing a sitemap should be done upon completion of the website with the goal being to include all links and information. If you are using WordPress generating a sitemap.xml and sitemap.xml.gz file is easily accomplished through a single plugin: Google Sitemaps XML WordPress
"This plugin will create a Google sitemap compliant XML-Sitemap of your WordPress blog. It supports all of the WordPress-generated pages as well as custom ones. Every time you edit or create a post, your sitemap is updated and all major search engines that support the sitemap protocol, like Google, Bing, or Yahoo! are notified of the update."
If you are not using WordPress, creating a sitemap is still straightforward, visit a site like xml-sitemaps.com/ to create one and upload the file to your root directory. After uploading the file you will need to inform the search engines that you have a sitemap to read:
Google provides a dedicated portal for this through their Google Search Central. It is necessary to sign up and verify ownership of the website by way of a provided meta tag similar to the example below. Once verified instructions are provided to submit a sitemap.
meta name="verify-v1" content="Unique Identifier"
Yahoo no longer provides for website or sitemap submissions as they are now rolled into Bing Webmaster Tools. If you've submitted to Bing, then you've submitted to Yahoo. According to Yahoo, "You can manage how your website appears in Yahoo Search by using meta tags and your robots.txt file. Yahoo Search results come from the Yahoo web crawler (Slurp) and Bing’s web crawler."
Bing uses a similar URL submission through their 'Bing Webmaster Tools'. Be sure to change "doc4design.com" to your own URL then visit the link for submission.
http://www.bing.com/webmaster/ping.aspx?sitemap=doc4design.com/sitemap.xml
Once we begin to upload folders and files it is a good idea to prevent unscrupulous individuals from snooping around in them. Such folders include the https folder, image folders, or even a downloads folder. Access to these unprotected folders by other individuals is as easy as typing the absolute URL to their location, for example, https://doc4design.com/images. Assuming this folder was accessible, you would see a list of all image files within the folder and could then download them. Preventing access to these files also helps close up backdoors into the site, streamlining it.
Preventing access is accomplished by creating an index.html file that can be blank or contain information informing the user they shouldn't be there. Once the file is ready, upload it to the folder you would like to prevent access to.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>403 Forbidden</title>
</head>
<body>
<h1>Forbidden</h1>
<p>You don't have permission to access /wp-content/plugins/ on this server.</p>
</body>
</html>
The doctype declaration should be the first piece of code on any HTML document even before the tag appears. A doctype declaration is a set of instructions given to the web browser declaring the markup language used. It specifies rules instructing the browser on how to render the page correctly. Much has been written about the doctype declaration and a good source of information can be found at A List Apart in their article "Fix Your Site With the Right DOCTYPE!"For an extensive list of doctypes available visit the W3C website "Recommended DTDs to use in your Web document". Not including a doctype can throw a browser into Quirks Mode. Wikipedia explains this best in their Quirks Mode article:
"To maintain compatibility with the greatest possible number of web pages, modern web browsers are generally developed with multiple rendering modes: in "standards mode" pages are rendered according to the HTML and CSS specifications, while in "quirks mode" attempts are made to emulate the behavior of older browsers. Some browsers (those based on Mozilla's Gecko rendering engine, or Internet Explorer 8 in strict mode, for example) also use an "almost standards" mode which attempts to compromise between the two, implementing one quirk for table cell sizing while otherwise conforming to the specifications."
We typically include the following doctype within our pages:
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
In addition to the doctype, we need to specify the language of the website. Obviously, our parent language is English "en-US" but for a full list of codes visit Google Search Central. If using a specialized language, it is also important to specify the text direction left to right (LTR) or (RTL).
html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"
The author tag provides search engines with information about the author of the document. Typical usage may include the name and email address of the webmaster:
meta name="author" content="Webmaster: [email protected]"
The copyright tag provides search engines with information regarding copyright, trademark, patent or other information pertaining to intellectual property. Typical usage:
meta name="copyright" content="copyright 2005-2009"
The audience tag provides search engines and software with information for parental control. Typical usage:
meta name="audience" content="All"
The revisit-after tag provides search engines information on when they should revisit the site. Typical usage:
meta name="revisit-after" content="3 days"
The description tag provides search engines with a brief site description displayed within search engine results. It is highly recommended this be included. Typical usage:
meta name="description" content="<span>Advertising and Marketing agency located in California</span>"
It is not uncommon to think of meta tags as obsolete considering Google has officially denounced their usefulness. This statement is generally referring to a website's meta "keywords". Keywords are terms used by search engines to assist in determining the site's content.
From Wikipedia "The keywords attribute was popularized by search engines such as Infoseek and AltaVista in 1995, and its popularity quickly grew until it became one of the most commonly used meta elements. By late 1997, however, search engine providers realized that information stored in meta elements, especially the keywords attribute, was often unreliable and misleading, and at worst, used to draw users into spam sites. (Unscrupulous webmasters could easily place false keywords into their meta elements in order to draw people to their site.) Search engines began dropping support for metadata provided by the meta element in 1998, and by the early 2000s, most search engines had veered completely away from reliance on meta elements. In July 2002, AltaVista, one of the last major search engines to still offer support, finally stopped considering them. No consensus exists whether or not the keywords attribute has any effect on ranking at any of the major search engines today. It is speculated that it does, if the keywords used in the meta can also be found in the page copy itself. With respect to Google, thirty-seven leaders in search engine optimization concluded in April 2007 that the relevance of having your keywords in the meta-attribute keywords is little to none and in September 2009 Matt Cutts of Google announced that they are no longer taking keywords into account whatsoever. However, both these articles suggest that Yahoo! still makes use of the keywords meta tag in some of its rankings. Yahoo! itself claims support for the keywords meta tag in conjunction with other factors for improving search rankings."
meta name="keywords" content="advertising, design, articles"
This article covers the initial building blocks for starting out strong when building a site but it is important to remember that good design, strong coding skills, social marketing, and many other factors come into play to create a well-rounded website. We hope this article has helped at least a few of you out there and just to reiterate, we have provided a quick setup sheet on how to implement the doctype, language, sitemap verification and meta tags:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head>
<!-- Google Verify -->
<meta name="verify-v1" content="Unique Identifier" />
<!-- Yahoo Verify -->
<meta name="y_key" content="Unique Identifier" />
<meta name="author" content="Webmaster: [email protected]" />
<meta name="copyright" content="copyright 2005 - 2030">
<meta name="audience" content="all">
<meta name="revisit-after" content="3 days" />
<meta name="description" content="<span>Advertising and Marketing agency located in California</span>" />
<meta name="keywords" content="<span>advertising, design, articles</span>" />
</head>