Script to process apache log file to fight spammers / DDOS attackers
One of the challenges with working with dynamic websites is that you have to keep fighting malicious users who regularly sap your server capacity with rogue crawling of your site. To do this you would have to monitor and analyze the traffic patterns on the server regularly. You would definitely want to do this when you have load spikes on the server and you wish to find out the IP addresses, user agents and the specific URLs which resulted in these spikes. This is all the more relevant in Drupal sites where a rogue bot can take down the site when proper DDOS mechanisms are not set in place.
The latest copy of the script can be downloaded from
https://github.com/zyxware/misc-utils/tree/master/ls-httpd
You can copy the script to /usr/local/bin or into some folder which is in your $PATH variable on the server. Remember to configure the script with the path to your apache access log. You can update the default value of the variable log_file to wherever your apache log file is located. Also do note that the script was written for the specific log file format used in our servers. You might want to tweak the awk parameters if your apache log file uses a different format.
Alternatively you can use the following as your apache log file format in apache.conf
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
and then ensure that your log format is set up as combined in your virtualhost configuration
CustomLog ${APACHE_LOG_DIR}/access.log combined
The following are some example usage patterns
ls-httpd url 1000
will find top URLs in the last 1000 access log entries
ls-httpd ip 1000
will find top IPs in the last 1000 access log entries
ls-httpd agent 1000
will find top user agents in the last 1000 access log entries
ls-httpd url 17:
will find top URLs from 17:00:00 to 17:59:59
ls-httpd url 17:2
will find top URLs from 17:20:00 to 17:29:59
ls-httpd url 17:21
will find top URLs from 17:21:00 to 17:21:59
ls-httpd url 17
will find top URLs in the last 17 access log entries :-)
Hope you find this useful.