![]() |
sitemap.pl Script : |
||||
|
|||||
Affiliate Marketing Blog · Follow Us on Twitter · PPC ROI Calculator
|
|
Note: "cgi-bin" directory
The instructions in this documentation assume that your web server has a directory namedcgi-bin where scripts are to be stored on your web server. Some web servers use a different name for this directory, such as:cgi-local , or cgi, or mainwebsite_cgi or something similar. In that case, where you seecgi-bin in any instruction, substitute it with the directory name that your web server uses.
Follow these steps to install the sitemap.pl script:
(it might be easier if you print out this documentation, or at least the first two or three pages, and check off each completed step).
Download sitemap.pl .zip file to your computer.
Unzip the .zip file. You should end up with a directory sitemap-YYMMDD (such as sitemap-080723).
Open your FTP program and connect to your web server.
Go into your cgi-bin directory and create a subdirectory named sitemap
Go into the newly created subdirectory sitemap
On your local computer, go into the sitemap-YYMMDD
Upload sitemap.pl to your web server. Note: Upload the file using ASCII transfer mode. If necessary, change the first line of sitemap.pl to the path of perl on your web server (default is: /usr/local/bin/perl).
Select sitemap.pl file on your web server and do CHMOD 755 (User: read/write/execute; Group: read/execute; Other: read/execute) so that sitemap.pl can be run.
On your web server, go to your root directory (the directory where your home page file is located).
On your local computer, open the
htaccess-rules.txt file that came in the .zip file. Select and copy all those statements. Then edit the .htaccess file located on your web server. Paste thesesitemap-related statements at the start of your .htaccess file and save it to your web server.Note: If your web server does not have a .htaccess file, you can upload
htaccess-rules.txt and then rename it to.htaccess (starts with a . dot and does not end with any .txt ending).Note: If you are using WordPress or any other publishing system that has virtual files, put the
sitemap-related statements before the statements of your publishing system. Example, putsitemap-related statements before theWordPress-related statements. In general, if you alread have a statement RewriteRule ^.*$ (i.e.: matches everything) then it should be kept at the end of your .htaccess file.To test that sitemap.pl is installed successfully, open your web browser and access
DOMAIN.com/sitemap.txt (substitute in your own domain name in this URL) and you should see a list of the files on your website.
500 Internal Server Error
If you see a "500 Internal Server Error" then follow these steps:
Redownload the .zip file to your Windows PC and unzip it again.
Reupload sitemap.pl file using ASCII transfer mode (not binary mode!)
Select sitemap.pl on the web server and do CHMOD 755 (User: read/write/execute; Group: read/execute; Other: read/execute)
If necessary, change the first line of sitemap.pl so it has the correct path to perl on your web server (default is /usr/local/bin/perl; try: /usr/bin/perl). If you're not sure, look in any other .pl file on your web server that works, or ask your hosting company what is the path to perl.
Test the installation by accessing
DOMAIN.com/sitemap.txt again. Press your web browser's Refresh/Reload button.If you see any files in the sitemap list that you prefer not be listed, then create a configuration file as indicated in the "Configuration File" section below.
In your web server's root directory, edit your robots.txt file and add the following line at the end (substitute in your own domain name):
sitemap: http://DOMAIN.com/sitemap.xml Note: If you do not have a robots.txt file, run the Windows Notepad editor and create a robots.txt with the above line (substitute in your own domain name) and upload it to your web server's root directory.
(optional) For instructions from Google.com on how to submit your Google Sitemap to Google.com, see: "How do I submit a Sitemap?" If you have set up your robots.txt file as indicated above, it is not necessary for you to submit your sitemap to Google; however, by submitting your sitemap to Google, you gain access to some useful GoogleBot crawler status information from Google.com. Therefore, we recommend that you do submit your Google Sitemap to Google.com
sitemap.pl is distributed as shareware. If you find sitemap.pl to be useful and continue to use it, please purchase a license; it's a
one-time fee of only $14.95.
Note: sitemap.pl v10.03.09-beta is required if you want a log file. Earlier versions do not generate logs. Upgrade.
When you run sitemap.pl, it creates a log file
sitemap-log.txt (located in the same directory where sitemap.pl is located).Download or view that log file using your FTP program and you will see why sitemap.pl is including/exluding each of your files.
An optional configuration file
sitemap-skip.txt (located in the same directory where sitemap.pl is located) can be used to tell sitemap.pl what files to exclude from the sitemap.The sitemap.pl script has a
built-in list of common exclusions (e.g.: skip all .gif files). Thus you only need to create asitemap-skip.txt only if sitemap.pl is listing files that you do not want to list in the sitemap.The
built-in exclusions are:
#-- specific files -- /postinfo.html #-- patterns -- /core.? /google?.html */HEADER.html */README.html *_ #-- directories -- /cgi-bin/ /log/ /logs/ /private/ /webalizer/ #-- extensions -- .asa .bak .bat .bmp .css .csv .db .dll .exe .gif .gz .ico .ini .jpeg .jpg .js .mdb .mid .mp3 .mpeg .mpg .msi .pdf .pl .png .psp .rar .rm .sql .swf .tar .temp .tgz .tif .tiff .tmp .txt .url .xls .xml .xsd .xsl .wav .wma .wmv .xbm .zip
Tip: View Built-in Exclusions List
You can view the built-in exclusions list by accessing:DOMAIN.com/cgi-bin/sitemap/sitemap.pl?skip
Comments and Whitespace: (spaces, tabs, #)
Comments are indicated with # and are discarded. Comments can be on lines by themselves or on the same line as a statement. Everything from the # to the end of the line is discarded.
You can freely use spaces and/or tabs throughout. Leading/trailing spaces and/or tabs are discarded; thus you can indent statements/comments. Blank lines are ignored. Multiple consecutive spaces and/or tabs (whitespace) are treated as a single space; thus you can use any spacing/tabing you prefer.
Filename Ending: (.ending)
To discard all files with a particular filename ending (e.g.: all .gif files), type .gif into your
sitemap-skip.txt file. Note: You do not have to type in any filename ending that is part of thebuilt-in exclusions list; those are already taken care of for you. Most of the common filename endings are already in there.You can put more than one filename ending on the same line; simply separate each one by a tab or a space. To keep the file organized, it's recommended that you keep your filename endings sorted alphabetically (but you don't have to).
Directory: (/directory/ and /directory)
To skip an entire directory, type that directory into your
sitemap-skip.txt file.For example, /projects/ would cause the entire /projects/ directory to be excluded.
For example, /projects (no trailing /) would cause the entire /projects/ directory to be excluded.
For example, /data/private/ would cause the entire /data/private/
sub-directory to be excluded. The /data/ directory itself would still be included.
Patterns: (* and ?)
You can use * and ? to specify pattern matches.
The * matches zero or more occurrences of any character, including: A .. Z, a .. z, 0 .. 9, and all symbols including / (the directory character).
The ? is similar to the * match pattern character; however, ? does not match the / character. This is an important distinction since ? enables pattern matches such as /google?.html which would match /google-adsense.html but not /google/adsense.html (because ? does not match the / character).
For example, */HEADER.html would cause HEADER.html located in any directory (including the root directory) to be excluded.
For example, */private/ would cause a directory named private located in any directory to be excluded.
File: (/directory/file.ending)
You can exclude a particular file by simply stating it.
For example, to exclude /google.html, simply type /google.html into
sitemap-skip.txt For example, to exclude /google/adsense.html, simply type /google/adsense.html into
sitemap-skip.txt For example, /*/HEADER.html would cause HEADER.html located in any
sub-directory (but not the root directory) to be excluded.
Ignore a Built-in Exclusion
To override a built-in exclusion, type the exlusion into your
sitemap-skip.txt file and prefix it with a plus sign (+).For example, if you want to override the exclusion of .pdf files and thus have .pdf files appear in the sitemap, then add the following line to your
sitemap-skip.txt file:
Note: There must not be any spaces after the +
The created XML sitemap file complies with the Sitemap Protocol 0.9 as defined by sitemaps.org
sitemap.pl automatically includes the optional <lastmod>, <changefreq>, and <priority> tags as part of each <url>. Currently, there is no configuration option to turn off generation of these tags; they are always generated.
The value of the <lastmod> tag is the last-modified timestamp of the URL (directory or file).
The value of the <priority> tag is based on the URL:
<priority> Tag Value URL 1.0 Home page 0.8 Directory (at any depth) 0.6 File Thus the order of priority is: home page, directories, files.
Note: There are no configuration variables to change these values.
The value of the <changefreq> tag is based on the URL's
last-modified timestamp, except for the home page which is always daily.
<changefreq> Value Condition daily Home page weekly Modified within 2 months monthly Modified within 6 months yearly Modified within 3 years never Older than 3 years Note: There are no configuration variables to change these values or conditions.
To view the version number of sitemap.pl, access sitemap.pl with the version parameter:
DOMAIN.com/cgi-bin/sitemap/sitemap.pl?version To view the built-in exclusion list, access sitemap.pl with the skip parameter:
DOMAIN.com/cgi-bin/sitemap/sitemap.pl?skip Advanced users: To view the regular expression built from wildcard exclusions (* and ?) , access sitemap.pl with the skipre parameter:
DOMAIN.com/cgi-bin/sitemap/sitemap.pl?skipre