AWstats is a free, popular log analyzer, released under the GPL. It can generate advanced graphical statistics from web, streaming, ftp or mail server log files. This document is not intended to be a review, but rather a quick installation and configuration guide for a specific web site, in order to have as accurate statistical data as possible for use in your traffic analysis reports.
AWstats is actually a Perl script (awstats.pl), which parses your server’s log files and generates reports either dynamically, when used as a CGI script through the web browser, or by creating static HTML pages, when used directly from the command line or through cron. It also comes with some other helper Perl scripts to make this task even easier.
Article Goals
The goals of this document are to:
- Install AWstats in a custom location as a normal user. Although it is possible to have a system-wide installation, I chose this approach for completeness. The differences between the two methods are just the scripts’ locations. The rest of the configuration stays the same.
- Create a configuration file for our web site (Apache Virtual Host) for as accurate statistics as possible.
- Parse this host’s log file and create a database with statistical data.
- Use this statistical data to generate web site traffic reports. We will focus on the creation of static HTML reports, but some info on how to use
awstats.pl
as a CGI is also provided. - Make a quick introduction to user-defined charts.
Prerequisites
This document assumes that:
- Our web site is configured to have its own log file.
- The log file is written in the "combined" format (NCSA combined/XLF/ELF log format)
- You have configured the Apache web server to do reverse DNS lookups (
HostnameLookups On
). This means that the log files contain the visitors’ hostnames instead of their IP addresses in the HOST field. This is not necessary though. - We have access to the log file.
Actually, only the last one is a necessity, as awstats can be configured to generate statistics even from heavily customized log formats. For this article we will use an apache log file in the "combined" format.
Custom installation in our Home directory
We will install the AWstats package in our Home directory. So, download the latest awstats version from the Project Page and extract it:
# tar -xzvf awstats-X.X.tar.gz -C /home/jsmith/
A new directory (awstats-X.X
) is created in our Home. This is where all scripts and other supplemental files are installed. You may want to rename this to just awstats
:
# mv awstats-X.X awstats
We will need to create two more directories, one for the awstats statistical data and one for the traffic reports (static HTML pages). So, we create the first one inside the awstats installation directory:
# mkdir /home/jsmith/awstats/statdata
The directory which will hold the traffic reports can be located inside our web site’s root directory, so that they are accessible from a web browser. Assuming that our DocumentRoot
is /home/jsmith/public_html/
, we create a new directory in there:
# mkdir /home/jsmith/public_html/traffic
Using this installation scheme, we avoid exposing the awstats scripts to the internet. Only the traffic reports will be accessible through a web browser. This means that it will not be possible to use awstats.pl
as a CGI script to generate reports dynamically (directly from our statistical data), but this behaviour can easily be changed.
We also need to copy some images, which are used in the HTML or PDF traffic reports, to the traffic
directory:
# cp -R /home/jsmith/awstats/wwwroot/icon/ /home/jsmith/public_html/traffic/
The last part of the installation process is to set the appropriate permissions to the AWstats directories and files. So, we set the mode to 0755 for directories and 0644 for all files. Because the Perl scripts (*.pl files) need to be executable, we set their mode to 0755. The following lines do all this:
# find /home/jsmith/awstats/ -type d | xargs chmod 0755
# find /home/jsmith/awstats/ -type f | xargs chmod 0644
# find /home/jsmith/awstats/ -type f -name *.pl | xargs chmod 0755
We also need to set the appropriate permissions to the directory which will hold the reports and which will be accessible from the internet:
# find /home/jsmith/public_html/traffic/ -type d | xargs chmod 0705
# find /home/jsmith/public_html/traffic/ -type f | xargs chmod 0604
That’s enough with the installation.
Configuration
We need to create a configuration file for our web site. This file will be read by awstats.pl
in order to generate the statistical data or the traffic reports. There is a sample configuration file in the /home/jsmith/awstats/wwwroot/cgi-bin/
directory, named awstats.model.conf
. We make a copy of this file in the same directory and replace the "model" part of the name with one that will represent our web site:
# cp awstats.model.conf awstats.mysite.conf
We will work on the copy. Although, modifying only a few basic directives, such as the logfile path and the statistical data directory path, would be enough, we will modify some more, so that our statistics are as accurate as possible and our reports look the way we want.
Open the awstats.mysite.conf
file in your favourite text editor and let’s start customizing it.
Note: I would suggest that you do not use relative paths whenever needed, but rather absolute ones.
LogFile="/home/jsmith/logs/access_log"
LogType=W
LogFormat=1
LogSeparator=" "
These are log file specific directives. If your log file is in the "combined
" format, all you have to modify is its path.
SiteDomain="www.mysite.com"
HostAliases="mysite.com"
Here we set our web site’s URL and all the aliases that can be used to reach the site with a web browser. Separate all aliases with a "space".
DNSLookup=0
By setting this directive to 0, no reverse DNS lookup requests will be sent to the nameserver. I have set the Apache web server to do these lookups, so a value of 0 is the proper one. You can set this to 1, which will lead to numerous lookup requests to the nameserver, or 2, which will make awstats do the resolving by examining a DNS cache file, if it exists. Keep in mind that having awstats do the reverse DNS lookups will slow the statistics update process dramatically.
DirData="/home/jsmith/awstats/statdata"
Set the directory where awstats will keep its statistical data. This is one of the directories we had created in the installation process.
DirCgi="/home/jsmith/awstats/wwwroot/cgi-bin"
This is the directory that contains the awstats.pl
script.
DirIcons="icon"
Remember that we had previously copied the awstats icons to the directory which will hold our reports? That’s why we do not need to specify an absolute path for these. Just set it to icon
.
CreateDirDataIfNotExists=0
If you had previously created the directory which will hold the statistical data, then a value of 0 will do. Otherwise set it to 1 to have the directory you have specified in the DirData
directive created.
KeepBackupOfHistoricFiles=1
It’s a good habit to have awstats keep a backup of the historic data during the update process.
DefaultFile="index.php"
Here we define the index file for our web site. In other words, our home page. This depends on your site.
SkipHosts="OUR OWN PCs' HOSTNAMES"
This is a very important directive ragarding the accuracy of the statistics. Usually, we are our web site’s most regular visitor and it’s obvious that we do not want to be counted as a visitor. This directive can take IP addresses or hostnames as values, separated with a space
. Regular expressions can be used in the form of REGEX[value]
. IP addresses cannot be mixed with hostnames, so, if the DNS lookups take place at the web server level, then we have to use hostnames as values, otherwise we have to use IP addresses. Usually we need to set the IPs or hostnames of all our LAN computers or computers we use to edit the website, so that they are ignored. Below are some examples:
SkipHosts="localhost REGEX[^.*\.example\.dyndns\.org$] test.mysite.com windowspc1"
OR
SkipHosts="127.0.0.1 REGEX[^192\.168\.] REGEX[^10\.]"
SkipUserAgents=""
If you use any custom spiders or bots to test or analyze your web site, but you don’t want their access to be included in the stats, then you should add their "User Agent String" as a value to this directive. Again, regular expressions can be used in the form of REGEX[value]
.
NotPageList="css js class gif jpg jpeg png bmp ico swf"
This is another important directive. Here we set what file extensions will not be counted as Page Views or Downloads, but only as Hits. Usually, this list includes files that are part of a web page (images, stylesheets, flash animations, java applets etc.).
URLWithQuery=0
URLWithQueryWithOnlyFollowingParameters=""
URLWithQueryWithoutFollowingParameters=""
URLReferrerWithQuery=0
These do not need to be modified, unless you want the query string to be included in the web page URLs or Referrer URLs in the traffic reports. Enabling the URLWithQuery
directive is important in the case your web page URLs are of the form /index.php?p=10
, so that it is clear in the traffic reports which page was viewed. On the other hand, if your page URLs are of the above form, but you use permalinks, then it’s not needed to modify the default values for these directives.
Including the query string in the referrer URLs is not important, in fact it can lead to lengthy meaningless referrer lists, which is not so convenient. I just provide this info here for completeness.
LevelForWormsDetection=2
By default, the detection of worms that have crawled your web site is disabled. You may want to enable this by setting the above directive’s value to 2.
Lang="en"
If you need the reports to be in a specific language, set it here. A list of supported languages exists in the configuration file.
ShowAuthenticatedUsers=PHBL
If your reports are private, you may set this directive’s value to PHBL (details about what each letter represents can be found inside the conf file), so that a section with details about your web site’s authenticated users is included in the reports.
ShowWormsStats=HBL
If you had previously enabled the worm detection, then you may want to include a detailed section about worms in the reports.
IncludeInternalLinksInOriginSection=1
By setting this to 1, a summary of how many links to another internal page have been followed from your site’s pages is included in the reports.
MaxNbOfDomain = 10
MaxNbOfHostsShown = 10
MaxNbOfLoginShown = 10
MaxNbOfRobotShown = 10
MaxNbOfPageShown = 10
MaxNbOfOsShown = 10
MaxNbOfBrowsersShown = 10
MaxNbOfRefererShown = 10
MaxNbOfKeyphrasesShown = 10
MaxNbOfKeywordsShown = 10
Here follows some info about the reports. You can create only one main page with a summary of the web site’s traffic, but you can also generate some supplemental pages which have full lists of the visited pages, referrers, countries, search engines etc. Each section in the main page includes a predefined number of entries that are displayed. For example, it displays by default the top 10 referrers. This number can be customized by modifying the directives above.
ShowLinksOnUrl=1
By default, each URL shown in the reports is a clickable hyperlink. If you do not want them to be actual hyperlinks, then set this to 0.
MaxRowsInHTMLOutput=1000
DetailedReportsOnNewWindows=1
With these directives you set the number of entries each of the supplemental reports can have and if you want these supplemental reports to be opened in a new browser window.
LoadPlugin="tooltips"
LoadPlugin="decodeutfkeys"
The AWstats package includes some plugins you can enable. I found the above two to be helpful. The first one enables the display of some descriptive tooltips and the second one makes it possible to show keywords and keyphrases correctly using national characters.
There are some other interesting plugins inside the awstats package, but also some more from other contributors. You can find the latter at the project’s web site. Keep in mind, that each plugin may require certain Perl modules to be installed.
Update the statistics database
Now that we have finished customizing our web site’s configuration file, we can finally have awstats.pl
parse our log file and create statistical data:
# perl /home/jsmith/awstats/wwwroot/cgi-bin/awstats.pl -config=mysite -update -showcorrupted
Notice that we do not use the whole configuration file’s filename (awstats.mysite.conf
) to define our configuration, but only the part between awstats.
and .conf
.
The -showcorrupted
option is not necessary. A total number of corrupted records would be displayed anyway. This just provides detailed info.
It would be convenient if you set cron to execute the above command on a daily or hourly basis. Here is a small BASH script that can be run through cron:
#! /bin/bash
# Update the statistics database
perl /home/jsmith/awstats/wwwroot/cgi-bin/awstats.pl -config=mysite -update -showcorrupted
# Calculate Total Visits for all months
TotVisits=$(grep ^TotalVisits /home/jsmith/awstats/statdata/*.txt | sed 's/^.*awstats.*TotalVisits.//' | awk '{sum += $1} END {print sum}')
# Export a small GIF image with the number of total visits
text2gif -t "$TotVisits" > /home/jsmith/public_html/traffic/counter.gif
# Set proper permissions on the GIF image
chmod 0604 /home/jsmith/public_html/traffic/counter.gif
exit 0
This small script updates the statistical data, calculates the total visits for all months and exports a small B&W GIF image which can be used as our custom counter in our web site. It’s not a real-time counter, but it’s better than nothing… Anyway, this just an example. The text2gif
utility is part of the libungif-progs
package.
Generate traffic reports
There are two methods to generate reports. Either by using awstats.pl directly or by using a helper script, named awstats_buildstaticpages.pl.
To generate the main report for November 2005 using awstats.pl
, you can issue the following command:
# perl /home/jsmith/awstats/wwwroot/cgi-bin/awstats.pl -config=mysite -month=11 -year=2005 -output -staticlinks > /home/jsmith/public_html/traffic/awstats.mysite.200511.html
If the options -month
and -year
are omitted, then the report is generated for the current month. You can also generate a report for a whole year, by setting these two options to -month=all and -year=2005.
You can view the page with your web browser at:http://www.mysite.com/traffic/awstats.mysite.200511.html
Furthermore, you can create supplemental reports (lengthy lists of referrers, countries etc.) or even apply filters. This info is covered in detail in the awstats documentation. See the relevant section here.
A quick way to create full reports (main and supplemental pages) is to use the helper script, awstats_buildstaticpages.pl. This can be used in the following way:
# perl /home/jsmith/awstats/tools/awstats_buildstaticpages.pl -configdir=/home/jsmith/awstats/wwwroot/cgi-bin -config=mysite -awstatsprog=/home/jsmith/awstats/wwwroot/cgi-bin/awstats.pl -dir=/home/jsmith/awstats/statdata -month=11 -year=2005 -builddate=200511
Here is an explanation for some of the options:
-configdir: Sets the path of the directory which contains the configuration files.
-awstatsprog: Sets the path to the awstats.pl
script.
-dir: Sets the directory where the report files should be saved.
-builddate: Adds month and year info in the report’s filename.
Again, if the options -month
and -year
are omitted, then the report is generated for the current month and year.
Other options that can be used are:
-update: Updates the awstats statistics database before generating any reports.
-buildpdf: Creates a PDF file, after the generation of the HTML pages is done.
In order to generate PDF files, the package htmldoc needs to be installed in the system.
It would be more convenient if you set cron to execute the above command.
AWstats Extra Section Configuration
AWstats can be configured to include user-defined charts in the reports. These are defined in the "Extra Section" in the awstats.mysite.conf
file. An explanation for each directive is included withing the conf file. Here I provide two examples that work together with some notes, just to get you started with custom charts.
Keep in mind the following two things:
- Every time you define a new extra chart, you have to increment the number in the name of each directive. For example, for the first extra chart the directive that defines the chart’s name would be ExtraSectionName1, for the second extra chart it would be ExtraSectionName2 etc.
- Every time you define a new extra chart, but you want it to include info from already parsed log files, you have to recreate the awstats historical statistical data. You can simply delete the contents of the
/home/jsmith/awstats/statdata
directory and parse all your log files again.
At least a basic knowledge of Regular Expressions is required in order to configure extra charts.
Top 50 RPM Downloads
This user-defined chart displays the Top 50 RPM package downloads (used for the current web site):
ExtraSectionName1="Top 50 RPM Downloads"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1=""
ExtraSectionFirstColumnTitle1="Package Name"
ExtraSectionFirstColumnValues1="URL,\/packages\/(.*)\.rcn.*\.rpm$"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionStatTypes1=HB
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=50
MinHitExtra1=1
Top 100 Referrers by Domain
This user-defined chart displays the Top 100 Referrers by Domain. It also merges referrer URLs of the form www.domain.com
and domain.com
to just domain.com
.
ExtraSectionName2="Top 100 Referrers by Domain"
ExtraSectionCodeFilter2="200 304"
ExtraSectionCondition2=""
ExtraSectionFirstColumnTitle2="Referring Domain"
ExtraSectionFirstColumnValues2="REFERER,^http:\/\/www\.([^\/]+)\/||REFERER,^http:\/\/([^\/]+)\/"
ExtraSectionFirstColumnFormat2="%s"
ExtraSectionStatTypes2=PHBL
ExtraSectionAddAverageRow2=0
ExtraSectionAddSumRow2=1
MaxNbOfExtra2=100
MinHitExtra2=1
Some notes
User-defined charts add much more flexibility to AWstats. Sometimes, even non-professional webmasters need to "dig" into the server logs for some special info about their web site. This can be perfectly achieved by using custom scripts, but the extra charts are a better way of doing this.
Three are the most important directives in the extra chart configuration:
- ExtraSectionCodeFilter: This filters the log entries according to the HTTP code that the web server returned after a page or file request.
- ExtraSectionCondition: With this we can set some rules that define which entries will pass or not. The rules are of the form "
URL, regular expression
" and they can be separated with "||
", which means "OR
". Instead of theURL
field, other fields like the User Agent string or the Referrer URL can be checked. These are documented in the configuration file’s comments. This directive can be left blank - ExtraSectionFirstColumnValues: This defines what is the value that will be displayed in the custom chart. This is the same as the
ExtraSectionCondition
, but it could be considered as a third level of filtering. This directive cannot be left blank. An important thing to take a note of is that you need to specify a group in the regular expression. This means that a part or all of the regular expression must be in parenthesis. Whatever this group matches will be the value in the chart.
It’s clear that the knowledge of regular expressions is the absolute key in configuring an extra chart. This document is not intended to be a REGEX guide. I am not an expert on this anyway, so it would be pointless. Some helpful links can be found in the "Further Reading" section of this document.
Apache Configuration (optional)
Using this AWstats installation and configuration guide, there is no need for any special configuration at the web server level.
But, if you have not created the directory that holds the traffic reports (/home/jsmith/public_html/traffic
) inside your DocumentRoot
, then adding an Alias
in your Apache VirtualHost configuration is necessary. For example, if you have created the traffic
directory in /home/jsmith/traffic
, then the following Alias
must be added in your Apache or Virtual Host configuration file, so that the reports are accessible from a web browser:
Alias /traffic /home/jsmith/traffic
AllowOverride AuthConfig
Options None
Access control directives can be added inside the
tags or in an .htaccess
file, but this will not be covered in this document.
On the other hand, if you want to use awstats.pl
as a CGI script in order to create the traffic reports dynamically from the web browser, then the addition of a ScriptAlias
in your Apache or Virtual Host configuration is necessary. Assuming that you have followed the custom installation instructions of this guide, then this ScriptAlias
could be:
ScriptAlias /traffic-bin/ "/home/jsmith/awstats/wwwroot/cgi-bin/"
AllowOverride None
Options None
Order allow,deny
Allow from all
Now, point your web browser at:
http://www.mysite.com/traffic-bin/awstats.pl?config=mysite
All the awstats.pl
options, except for -staticlinks
, are supported, so you can try the following:
http://www.mysite.com/traffic-bin/awstats.pl?config=mysite&month=08&year=2005
Using awstats.pl
as a CGI script, the reports are created in real-time from the statistical data, so it might be slow. This adds unnecessary load on the server. Furthermore, AWstats had some security related issues in the past, so using it as a CGI script is not recommended, unless you are sure that these problems have been solved or you implement access restrictions.
Further Reading
Here are some documents you might find useful:
- The AWstats Documentation
- The AWstats FAQ
- HTTP Status Code Definitions
- Regular Expression HOWTO
- Syntax of regular expressions in Perl (Perl Documentation)
- How to build HTMLDOC RPM package for Fedora Core by Thomas Chung
Related Articles
- Maxmind’s GeoIP.dat.gz location change
- Track ‘em Down!
- When it comes to error messages…
- Set up an anonymous FTP server with vsftpd in less than a minute
- High traffic on the email server
Thanks : George Notaras || A quick AWstats guide
ไม่มีความคิดเห็น:
แสดงความคิดเห็น