mod_dnsbl 0.10

Internet management for the rest of us. Together with mod_clamav, and mod_authz_ldap, mod_dnsbl provides a fairly complete solution for web content management.

The Problem

Many corporations want or are almost forced by the legal situtation in their countries to censor the internet access of their employees. E.g. allowing access to pornographic sites may be construed as sexual harassment. For this reason, most administrators have their time proven sets of regular expressions to match against the URI. However, there are some problems with this approach:

  1. Duplication of work: Censoring isn't really fun work, so many administrators duplicate work that has already been done. But since different companies usually have different policies, pattern files cannot simply be shared. Furthermore, many organisations have several proxies, which means that rule files must be distributed do several hosts.
  2. Unsuspecting URLs: Many sites nowadays hide behind names that don't really tell anything about their contents. Such web site names lead to long regular expression lists, which are hard to maintain.
  3. False positives: a pattern that matches sex, also matches msexchange. While open source advocates don't really mind if this particular string is blocked, the reason is quite different from the reason behind the original pattern.
  4. Administrative overhead: usually, the people making the censoring decisions are nontechnical people, the technicians will then have to implement their decisions. An administrator may be tempted to write a web based GUI for this purpose.

In a way, the problem is similar to lists of notorious spammers administrators have come to maintain.

A Solution

A solution to this problem would have the following features:

Of course, this sounds very much like a description of the DNS. And this isn't a new idea, of course, spam fighters already use the DNS as a distributed database to detect open relays or spam outfits.

The mod_dnsbl apache module proposes to extend the idea of DNS blacklists to website classification. Here is how it works.

The idea to also place the catalog in the DNS was rejected because a simple mapping from IP addresses to host names would block part of the namespace 127.in-addr.arpa for a very special purpose, which is not acceptable.

There are several performance features that make this solution technically more compelling than e.g. the Websense product. Large sites can easily do a zone transfer of the zone. Using a cache only DNS server, the round trip time for a site can be reduced to a local DNS round trip. Most systems nowadays have a naming services cache daemon that is even faster. These caches often also do negative caching, so that lookups for hosts that are not classified only seldom suffer a penalty.

The administration can also be simplified. Nowadays most sites have fancy GUI tools to do their DNS administration, they can also be given to nontechnical people. If you define canonical names for the category titles, the content administrator can just enter CNAME records for the sites he wants blocked. He does not need access to any of the proxies to activate the filtering.

An implementation

The Apache mod_dnsbl is an implementation of this idea. It is distributed under the Apache license, and can be downloaded from http://software.othello.ch/mod_dnsbl/mod_dnsbl-0.10.tar.gz. It is installed as most other modules:

./configure --with-apxs=/usr/local/apache/bin/apxs
make
make install
The module is added to an Apache based proxy, and configured as follows:
DnsblSuffix	dnsbl.othello.ch
DnsblContact	webmaster@yourdom.ain
DnsblTemplate	/usr/local/apache/htdocs/dnsbl/template.html
DnsblDefaultAction	pass
# actions for squidguard lists
DnsblAction     127.0.0.1       block   Advertising
DnsblAction     127.0.0.2       block   Aggressive
DnsblAction     127.0.0.3       block   Audio-Video
DnsblAction     127.0.0.4       block   Drugs
DnsblAction     127.0.0.5       block   Gambling
DnsblAction     127.0.0.6       block   Hacking
DnsblAction     127.0.0.7       block   Proxy
DnsblAction     127.0.0.8       block   Violence
DnsblAction     127.0.0.9       block   "Illegal Software"
DnsblAction     127.0.0.10      pass    Mail
DnsblAction     127.0.0.11      block   Porn

# actions for adult list
DnsblAction	127.0.0.12	block	Adult

# our own actions
DnsblAction	127.1.0.1	skipauth	Open
DnsblAction	127.1.0.2	noscan		Virus-free

DnsblRecursionDepth	4
Please check the Apache documentation for the proxy configuraiton.

The implementation as an Apache module adds additional functionality not present in the regular expression list approach:

  1. We can allow access to certain resources without authentication.
  2. We can block or pass resources only at certain times during the day.
  3. We can control whether resources should be scanned or not.

Squidguard Blacklists

The squidguard project has generated nice blacklists, and the mod_dnsbl distribution includes a script blacklist2zone that converts the squidguard blacklists to a DNS zone. This zone is available under the suffix dnsbl.othello.ch. The sample configuration includes actions for all ip addresses used for squidguard categories.

Another very large blacklist can be found on http://cri.univ-tlse1.fr/documentations/cache/squidguard_en.html, it is particularly rich in adult urls. The blacklist2zone script can include this list also in the same zone.

You may use the zone dnsbl.othello.ch on timon.othello.ch for testing. If you use it more seriously, please do a zone transfer, as my bandwidth is limited.

Squid Redirector

Also included with the distribution is a squid redirector with the same functionality the mod_dnsbl module. Please consult the manual page dnsbl_redirector(1) for details. This redirector is currently not as functional as the Apache module, it does not understand anything about authentication and viruses of course.

TODO

This project is just beginning. If you are interested, please subscribe to the mailing list by sending a message containing the command

subscribe dnsbl
to majordomo@lists.othello.ch. There are quite a few things that need to be done:
  1. We need a more detailed catalog of categories that can reasonably be attached to web sites.
  2. We need an interface for people to add their existing blacklists.

Configuration Reference

DnsblSuffix

Syntax: DnsblSuffix dnsbl.dom.ain ...
Default: none
Context: server config

This directive sets the DNS suffixes within which one should look for a given host name. The suffixes are later checked in that order, i.e. if some suffix turns up a classification which leads to a pass rating, no later domain can interfere with that. This can be used to override classifications some public DNS provides with a private DNS.

DnsblContact

Syntax: DnsblContact content@dom.ain
Default: none
Context: server config

This directive sets the Email address that is inserted into the error page the informs the user about the fact that she is not allowed to view the page.

DnsblTemplate

Syntax: DnsblTemplate /path/to/file.html
Default: none
Context: server config

The file /path/to/file.html is used as a template to inform the user that and why she is not allowed to view a page. The following replacements are performed before the page is sent to the user: '%%' is replaced by '%', '%u' is replaced by the requested URL, '%r' is replace by the reason, and '%c' is replaced by the contact address specified with DnsblContact.

DnsblDefaultAction

Syntax: DnsblDefaultAction { action }
Default: pass
Context: server config

By default, the module allows all requests. However, in some applications it might be desirable to use the system as a whitelist, in which case the default action should be block. See below for possible actions.

DnsblAction

Syntax: DnsblAction ip { action } [ reason ]
Default: none
Context: server config

This directive adds a blocking (or passin rule) to the rule table of the module. If the DNS query returns ip, the module will react according to the second argument. The block page displayed will include the string given as third argument for the reason, or the IP address if no reason is specified. See below for possible actions.

DnsblAuthoritative

Syntax: DnsblAuthoritative { on | off }
Default: off
Context: server config

This directive turns mod_dnsbl into a dummy authentication module, which accepts every user as authenticated provided the action for the URL is skipauth. This allows to build proxies that accept connections to some sites without authentication, while others still require authentication.

DnsblRecursionDepth

Syntax: DnsblRecursionDepth depth
Default: 0
Context: server config

This directs mod_dnsbl to also analyze URLs. If a name returns the IP address 127.255.255.255, then the module will map at most depth components of the URL path to a DNS name and will try to find an action in the DNS. If set to 0, the URL path matching is done.

DnsblMessage

Syntax: DnsblMessage message string
Default: none
Context: server config

Use the contents of this string as a template for the block page. The same replacements as with the DnsblTemplate directive are used.

Actions

With every ip address, we can associate an action string. An action string consists of comma separated rules what the module should do with the request. Each rule is composed of an action verb and a time specification. E.g. the following action string

pass/12:00-14:00,noscan
means that this resource should be passed between 12:00 and 14:00 local time, and nothing should be scanned for viruses. Or
block/08:00-17:00,pass,scan
means that the resource should be blocked during office hours, and after hours, everything should be scanned for viruses, even items that are normally considered safe. Note that time specifications are always in the form HH:MM-HH:MM.

The following action verbs are known to the module:

block
block this resource unconditinally.
pass
pass this resource
skipauth
if the resource is passed, then also don't ask for authentication.
scan
Always perform virus scanning, even if mod_clamav would normally not scan this resource. This allows to make the module scan images normally deemed harmless if the come from certain sites.
noscan
Don't scan this resource for viruses. This may be necessary in cases where downloading virus patterns through a proxy may trigger the virus scanner.

© 2003 Dr. Andreas Müller, Beratung und Entwicklung