logo

Making your BlogCFC Google “Sitemap’able”

logo

UPDATE 1/3/2005: So after a brief commentage from the CF_Jedi himself, I’ve found out that the Google Sitemap function has been added to BlogCFC already (and named the exact same thing – googleSitemap.cfm). Crazy me, for upgrading so late and not reading the documentation all the way through. Nonetheless, I’ll still try to add in the categories, and maybe eventually some type of local site spidering to find all valid URL’s.

Google Sitemaps … have you heard of ‘em?
In a nutshell, they’re an experimental (still in Beta, but hey, so is gmail and how many folks are using that) mechanism for allowing the Google crawlers to more efficiently index your site for page presence and updates. It’s a free service provided by Google that’s well worth a look at. In order to get started with Google Sitemaps you’ll first need to generate a Sitemap –which is really nothing more than an xml file with all the links from your site– that conforms to Google’s XML standard. That in itself initially looked like an incredibly tedious and tough task to do by hand considering that, at least for now, all my content is dynamic. Now there is a Sitemap Generator that Google offers, but it requires that your webserver have Python installed (I don’t have that option), and I’m not certain whether it can generate the dynamic links.

I’m no Jedi Master, only a simple Padawan, but I figured I’d test out my powers on working through this one. So here’s what I came up with.

Seeing as how I’m using the Ray’s BlogCFC for my site, I figured that there must be a way to automate the process of creating a sitemap using all my posts and static pages as the content for my map. So I added a "generateGoogleSiteMap" function to the blogcfc component which pretty much does all the work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
        <!----->
       
       
       
        ---&gt;    
       
       
       
       
       

       
           

           
       
       
       
            http://www.example.com/
            2005-01-01
            monthly
            0.8
       
       
       FYI: priority is only in relation to other URLs.
       But since we're cycling through all the postings, it won't really matter for the dynamic links.
       However, for the links I add manually, they will.
        ---&gt;
       
       
            <!---   here I can manually add links to the map --->
           
           
           
                #application.rooturl#/mylink.cfm
                2005-12-30
                #arguments.changefreq#
                0.5
           
           
            <!--- here I add the articles to the map --->
           
           
               
                   
                        #xmlFormat(makeLink(id))#
                        #dateStr#
                        #arguments.changefreq#
                        0.5
                   
               
           
       
   
        ")&gt;

So it’s pretty much explanatory and conforms to to Google’s protocol. Next, I just created a file called "GoogleSiteMap.cfm":

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
            #application.blog.generateGoogleSiteMap(changefreq=changefreq)#
           
            <!--- Ray's standard error catch --->
       
           
           
            #application.resourceBundle.getResource("type")#=#cfcatch.type#
            <hr />
            #application.resourceBundle.getResource("message")#=#cfcatch.message#
            <hr />
            #application.resourceBundle.getResource("detail")#=#cfcatch.detail#
           
           
           
            <!--- Logic is - if they filtered incorrectly, revert to default, if not, abort --->

Once this is done, you can actually test out file by visiting "http://[siteroot]/GoogleSiteMap.cfm" to make sure it works . Now follow the instructions for submitting your sitemap using the previous link you tested, add your verification file to the webroot (or blogroot), and wait for the map to be verified. It took about 7.5 hours for mine to complete. Also, just so you know, the sitemap can’t be larger than 10MB or contain more than 50,000 links. So if this happens the sitemap will need to be split up into smaller sitemaps.

What’s next? Well, I’ll add in the categories as nodes for the sitemap, too, using a URL rewrite’able format (i.e., http://[urlroot]/cat/id[categorynumber]) and try to figure out a way to crawl the whole site and generate the sitemap completely on the fly. Also, considering the 50,000/10MB limitation above (which I can’t ever imagine reaching), I’m also thinking about providing an option to create smaller sitemaps based on categories. Just thoughts…

Leave a Reply

You must be logged in to post a comment.

blog comments powered by Disqus
logo
logo
Powered by Wordpress