Knowledge Base

Search ClickTracks:

URL Pruning

URL Pruning is a way to have the ClickTracks Server delete a portion of a URL which you don't want to analyze. This feature can be useful to normalize various URLs in your log files. It can delete a fixed string or a regular expression portion of a URL

There are two main uses for URL Pruning:

Remove a Portion of a URL - In some logfiles the URL /file.html and /sitename/file.html are the same file. You can use this to delete the portion, "/sitename" from all URLs that are in this form.

Remove a Session ID - The other use is that sometimes a session id (a dynamic string) gets into the URL, and you can delete it with this feature.



Simple Examples:


Remove a fixed URL portion:

In Log file, perhaps there was a old portion of the URL from a prior version of the website:

/megasite/OLD/customer.html   (old version of URL)
/megasite/OLD/areacode.html
....
/megasite/customer.html       (new version of URL)
/megasite/areacode.html

In the URL Pruning dialog, enter:

/OLD

Save and (re)process your logfiles.

This will normalize your URLs to only contain the values:

/megasite/customer.html
/megasite/areacode.html


URLs can be pruned using a more advanced regular expression syntax. Regular expressions are often know as regexps and are familiar to web developers through programming languages such as Perl and PHP.

Regular expressions are useful if you want to remove a variable string, for example a session id, from your URLs. Suppose your URLs look like

 http://www.example.com/catalog/pineapple.html/102-0590433-8620953

where the last part is a session id which you want to remove. Then you could specify the regular expression

  /[0-9]{3}-[0-9]{7}-[0-9]{7}$

to remove the session id from the end.

Note: See http://support.clicktracks.com/clicktracks/pruning_and_matching_test.php for a very useful tool used for constructing regular expressions.

More information on regular expressions and syntax can be found through Google. A good primer is at

 http://aspn.activestate.com/ASPN/docs/Expect_for_Windows/1.0/regex.html