| sitemap.pl Script :
Note: "cgi-bin" directory
The instructions in this documentation assume that your web server has a directory named
cgi-binwhere scripts are to be stored on your web server. Some web servers use a different name for this directory, such as: cgi-local, or cgi, or mainwebsite_cgi or something similar. In that case, where you see cgi-binin any instruction, substitute it with the directory name that your web server uses.
Follow these steps to install the sitemap.pl script:
(it might be easier if you print out this documentation, or at least the first two or three pages, and check off each completed step).
Download sitemap.pl .zip file to your computer.
Unzip the .zip file. You should end up with a directory sitemap-YYMMDD (such as sitemap-080723).
Open your FTP program and connect to your web server.
Go into your cgi-bin directory and create a subdirectory named sitemap
Go into the newly created subdirectory sitemap
On your local computer, go into the sitemap-YYMMDD
Upload sitemap.pl to your web server. Note: Upload the file using ASCII transfer mode. If necessary, change the first line of sitemap.pl to the path of perl on your web server (default is: /usr/local/bin/perl).
Select sitemap.pl file on your web server and do CHMOD 755 (User: read/write/execute; Group: read/execute; Other: read/execute) so that sitemap.pl can be run.
On your web server, go to your root directory (the directory where your home page file is located).
On your local computer, open the
htaccess-rules.txtfile that came in the .zip file. Select and copy all those statements. Then edit the .htaccess file located on your web server. Paste these sitemap-relatedstatements at the start of your .htaccess file and save it to your web server.
Note: If your web server does not have a .htaccess file, you can upload
htaccess-rules.txtand then rename it to .htaccess(starts with a . dot and does not end with any .txt ending).
Note: If you are using WordPress or any other publishing system that has virtual files, put the
sitemap-relatedstatements before the statements of your publishing system. Example, put sitemap-relatedstatements before the WordPress-relatedstatements. In general, if you alread have a statement RewriteRule ^.*$ (i.e.: matches everything) then it should be kept at the end of your .htaccess file.
To test that sitemap.pl is installed successfully, open your web browser and access
DOMAIN.com/sitemap.txt(substitute in your own domain name in this URL) and you should see a list of the files on your website.
500 Internal Server Error
If you see a "500 Internal Server Error" then follow these steps:
Redownload the .zip file to your Windows PC and unzip it again.
Reupload sitemap.pl file using ASCII transfer mode (not binary mode!)
Select sitemap.pl on the web server and do CHMOD 755 (User: read/write/execute; Group: read/execute; Other: read/execute)
If necessary, change the first line of sitemap.pl so it has the correct path to perl on your web server (default is /usr/local/bin/perl; try: /usr/bin/perl). If you're not sure, look in any other .pl file on your web server that works, or ask your hosting company what is the path to perl.
Test the installation by accessing
DOMAIN.com/sitemap.txtagain. Press your web browser's Refresh/Reload button.
If you see any files in the sitemap list that you prefer not be listed, then create a configuration file as indicated in the "Configuration File" section below.
In your web server's root directory, edit your robots.txt file and add the following line at the end (substitute in your own domain name):
Note: If you do not have a robots.txt file, run the Windows Notepad editor and create a robots.txt with the above line (substitute in your own domain name) and upload it to your web server's root directory.
(optional) For instructions from Google.com on how to submit your Google Sitemap to Google.com, see: "How do I submit a Sitemap?" If you have set up your robots.txt file as indicated above, it is not necessary for you to submit your sitemap to Google; however, by submitting your sitemap to Google, you gain access to some useful GoogleBot crawler status information from Google.com. Therefore, we recommend that you do submit your Google Sitemap to Google.com
sitemap.pl is distributed as shareware. If you find sitemap.pl to be useful and continue to use it, please purchase a license; it's a
one-timefee of only $9.95.
Note: sitemap.pl v10.03.09-beta is required if you want a log file. Earlier versions do not generate logs. Upgrade.
When you run sitemap.pl, it creates a log file
sitemap-log.txt(located in the same directory where sitemap.pl is located).
Download or view that log file using your FTP program and you will see why sitemap.pl is including/exluding each of your files.
An optional configuration file
sitemap-skip.txt(located in the same directory where sitemap.pl is located) can be used to tell sitemap.pl what files to exclude from the sitemap.
The sitemap.pl script has a
built-inlist of common exclusions (e.g.: skip all .gif files). Thus you only need to create a sitemap-skip.txtonly if sitemap.pl is listing files that you do not want to list in the sitemap.
#-- specific files -- /postinfo.html #-- patterns -- /core.? /google?.html */HEADER.html */README.html *_ #-- directories -- /cgi-bin/ /log/ /logs/ /private/ /webalizer/ #-- extensions -- .asa .bak .bat .bmp .css .csv .db .dll .exe .gif .gz .ico .ini .jpeg .jpg .js .mdb .mid .mp3 .mpeg .mpg .msi .pdf .pl .png .psp .rar .rm .sql .swf .tar .temp .tgz .tif .tiff .tmp .txt .url .xls .xml .xsd .xsl .wav .wma .wmv .xbm .zip
Tip: View Built-in Exclusions List
You can view the built-in exclusions list by accessing:
Comments and Whitespace: (spaces, tabs, #)
Comments are indicated with # and are discarded. Comments can be on lines by themselves or on the same line as a statement. Everything from the # to the end of the line is discarded.
You can freely use spaces and/or tabs throughout. Leading/trailing spaces and/or tabs are discarded; thus you can indent statements/comments. Blank lines are ignored. Multiple consecutive spaces and/or tabs (whitespace) are treated as a single space; thus you can use any spacing/tabing you prefer.
Filename Ending: (.ending)
To discard all files with a particular filename ending (e.g.: all .gif files), type .gif into your
sitemap-skip.txtfile. Note: You do not have to type in any filename ending that is part of the built-inexclusions list; those are already taken care of for you. Most of the common filename endings are already in there.
You can put more than one filename ending on the same line; simply separate each one by a tab or a space. To keep the file organized, it's recommended that you keep your filename endings sorted alphabetically (but you don't have to).
Directory: (/directory/ and /directory)
To skip an entire directory, type that directory into your
For example, /projects/ would cause the entire /projects/ directory to be excluded.
For example, /projects (no trailing /) would cause the entire /projects/ directory to be excluded.
For example, /data/private/ would cause the entire /data/private/
sub-directoryto be excluded. The /data/ directory itself would still be included.
Patterns: (* and ?)
You can use * and ? to specify pattern matches.
The * matches zero or more occurrences of any character, including: A .. Z, a .. z, 0 .. 9, and all symbols including / (the directory character).
The ? is similar to the * match pattern character; however, ? does not match the / character. This is an important distinction since ? enables pattern matches such as /google?.html which would match /google-adsense.html but not /google/adsense.html (because ? does not match the / character).
For example, */HEADER.html would cause HEADER.html located in any directory (including the root directory) to be excluded.
For example, */private/ would cause a directory named private located in any directory to be excluded.
You can exclude a particular file by simply stating it.
For example, to exclude /google.html, simply type /google.html into
For example, to exclude /google/adsense.html, simply type /google/adsense.html into
For example, /*/HEADER.html would cause HEADER.html located in any
sub-directory(but not the root directory) to be excluded.
Ignore a Built-in Exclusion
To override a built-in exclusion, type the exlusion into your
sitemap-skip.txtfile and prefix it with a plus sign (+).
For example, if you want to override the exclusion of .pdf files and thus have .pdf files appear in the sitemap, then add the following line to your
Note: There must not be any spaces after the +
The created XML sitemap file complies with the Sitemap Protocol 0.9 as defined by sitemaps.org
sitemap.pl automatically includes the optional <lastmod>, <changefreq>, and <priority> tags as part of each <url>. Currently, there is no configuration option to turn off generation of these tags; they are always generated.
The value of the <lastmod> tag is the last-modified timestamp of the URL (directory or file).
The value of the <priority> tag is based on the URL:
<priority> Tag Value URL 1.0 Home page 0.8 Directory (at any depth) 0.6 File
Thus the order of priority is: home page, directories, files.
Note: There are no configuration variables to change these values.
The value of the <changefreq> tag is based on the URL's
last-modifiedtimestamp, except for the home page which is always daily.
<changefreq> Value Condition daily Home page weekly Modified within 2 months monthly Modified within 6 months yearly Modified within 3 years never Older than 3 years
Note: There are no configuration variables to change these values or conditions.
To view the version number of sitemap.pl, access sitemap.pl with the version parameter:
To view the built-in exclusion list, access sitemap.pl with the skip parameter:
Advanced users: To view the regular expression built from wildcard exclusions (* and ?) , access sitemap.pl with the skipre parameter:
E.&O.E.; © Cusimano.Com Corporation; www.c3scripts.com