Web Site Blocking with PAC Files

   

How to block undesirable web sites by filtering key words in the URL.


Overview
    You don't need to spend money to stop your kids (or your employees or coworkers) from accidentally browsing to a web site that's going to cause embarrassing problems. You don't even need to install a new program. The magic is done with a file called a "Proxy Auto Configuration" (PAC) file.

    A PAC file contains some JavaScript code that lets your browser know what route is has to take to connect to different sites on the Internet. For most home users, the browser normally connects directly to a web site. However, it's also possible to have the browser connect to a "proxy" computer that gets the web content and passes it back to your browser. If you connect through a proxy, the proxy can act as an intelligent "man in the middle", blocking pornography, gambling, advertising, and viruses. Sounds great, huh? What's the catch? Well, the companies that control proxy servers usually charge you money to use those proxy servers. That takes all the fun out of it!

    Now that you understand a bit about proxies and PAC files, maybe you see a way out! What if the PAC file told your browser it could connect directly to a site like www.disney.com, but told your browser it had to use a non-existent proxy to connect to www.sex.com? Then every time you tried to browse to www.sex.com, your browser wouldn't find the proxy, so couldn't make the connection! That is exactly what happens. Well... Almost exactly. Which is to say... not exactly. What really happens is that the PAC file tells your browser to connect to your own PC instead of connecting to the "bad" web site. Which accomplishes the same thing. 

    The PAC file has to contain (built in to the JavaScript) all the "bad" words you want to filter and block. The filtering is done to the URL, not to the web page body.  So a page like this (which has bad words in it) would get through because the URL doesn't have any bad words in it. Okay? Nothing is perfect. But the good part is that because the words are checked against the entire URL, that means you can't bypass it by searching through Google and viewing Google's cached results.

    The list of bad words isn't just words: The words are buried pretty deep in JavaScript code. But it is plain-text, so you can add and subtract words with nothing more than the Windows "notepad" program. Here's a few (tame) lines from the PAC file:

BadURL_Parts[i++] = "busty";
BadURL_Parts[i++] = "celeb";
BadURL_Parts[i++] = "centerfold";

    I bet you'll have no problem figuring out how to modify those words to suit your own needs. You just open up Notepad (Under "Start", "Programs", "Accessories"), then drag the PAC file (named "proxy") into Notepad. Save it when your done. Your computer may freeze for a few seconds after you save the file, and you may have to restart your browser to make your changes active, but that's all there is to it.

 

Automatic PAC File Download, Installation, and Configuration

    The PAC file script will automatically download the most recent pac file from hostsfile.org. It will put the downloaded file in the same folder as your "hosts" file, set up your local intranet zone and proxy caching, and configure Internet Explorer to use the new PAC file. The script will only set up "LAN" settings (DSL, broadband, cable, or satellite) and will not affect dialup connections. It will only set up the Internet Explorer browser and won't affect FireFox or any other browser. 


Manual PAC File Installation

    If you need support for dialup or other browsers, you can use the above script to get most of the work done, but you'll need to follow the "dialup" and "other browsers" part of the installation instructions below (green text) to finish the job.

1.   Where to put the PAC file
Most people who discuss PAC files are kind of vague about where to put them or what to name them. I'm going to be specific. The PAC file should be named "proxy" with no file extension and it should be in the same folder as your "hosts" file. Why? This puts it in a folder normally reserved for system files (which is good, because this is a system file), the lack of a file extension makes it look like all the other files there (so it won't attract attention), and the lack of a file extension makes it difficult for kids to open. The folder we are discussing is located here:
XP C:\Windows\system32\drivers\etc\
2000 C:\WINNT\system32\drivers\etc\
98/ME C:\Windows\

2.   Configure your browser to use the PAC file

In the Internet Explorer menu, select "Tools", then "Internet Options", then go to the "Connections" tab. Click the "Settings..." or "LAN Settings..." button depending on whether you have broadband or a dialup connection. If you aren't sure, you can do both.


Check the "Use automatic configuration script" box and enter the location of your PAC file. You must use the "file://" protocol when specifying your file location. It's a little awkward to type in because all the slashes go the "wrong way", but when you get done, you should have something like this:
XP or Vista file://C:/Windows/system32/drivers/etc/proxy
2000 file://C:/WINNT/system32/drivers/etc/proxy
98/ME file://C:/Windows/proxy

If you have cable broadband or DSL, you may want to do this for both the "LAN Settings" and "Settings" so you can't bypass everything by going through a dialup connection!

If you only have dialup, you still may want to do this for the "LAN Settings" so it will all be ready when you upgrade next year.

If you have a browser other than IE, the settings will be different, but not so much that you can't figure it out. For example, on FireFox, you go to the menu and select "Tools", then "Options", then select the "General" category, click the "Connection Settings" button, select "Automatic proxy configuration URL", and enter the path to your PAC file. Pretty simple.

3.   Define your local intranet security zone
Microsoft wrongly assumes that if there is a proxy (even one that doesn't exist), anything that doesn't use the proxy must be in your local intranet. Wrong! That would be a very bad thing from a security point of view! Typical Microsoft. We need to fix that. In the Internet Explorer menu, select "Tools", then "Internet Options", then open the "Security" tab. Select the "Local intranet" icon, then hit the "Sites" button. Remove the check from  the "Include all sites that bypass the proxy server" box.

4. Disable Proxy Caching
When IE sees that a proxy is needed for a web site, it uses the same proxy again later for the same web site. Normally, that's a good thing; a time-saver. For our purposes, it's ruinous! If I do a Google search for "sex", I want the PAC file to block it by directing the browser to a bad proxy. If IE remembers that proxy, that means it will continue blocking Google, even if I later search for "flowers". We need to force IE to look in the PAC file every time to decide whether a proxy is needed or not! If you want to do it manually by making a registry change, you can read http://support.microsoft.com/?kbid=271361. If you do, you'll end up with a result like this:


Notice that the registry section shown is for the current user. That means you'll have to change this setting for all your users. Generally this means you'll have to actually log in as every user and effect this setting. Of course, you could write a script to do it and put it in the "All Users Startup" or in the HKLM run section, but we're talking about manually installing things here. 

5. Empty your browser's cache
This is really optional. If you've been to a site you're trying to block, the page and pictures are in your cache. If you go there again, even if the PAC file blocks it, your browser could show you the old data.
  • In IE, go to the menu and select "Tools", then "Internet Options". On the "General" tab in the "Temporary Internet files" section, click the "Delete files..." button.
  • In FireFox, go to the menu and select "Tools", "Options", and click the "Privacy" icon. In the "Private Data" section, hit the "Settings" button and insure "Cache" is checked. Hit the "Clear Now" button to actually clear your cache.
  • In some other browser, figure it out yourself!

6. Set up a dummy web server
    The only reason you may need to do this is if you don't like to see error messages in your web pages! The easy way to eliminate the error messages is to install a small specialized web server or proxy server on your PC. Each time your browser tries to connect, the specialized program gives the browser a small transparent image rather than an error message. Problem solved!
    A great specialized web server is "Homer". Naturally, I've made a Homer auto-installation script that will download and install Homer correctly for you.
The great thing about Homer is that in addition to returning a blank image to replace the undesirable image or error message that might have originally displayed, Homer also has a log to show you what URLs are being blocked.


Resources

HostsFile.org PAC files and Hosts files
http://hostsfile.org/

John Lo Verso's "Bust Banner Ads with Proxy Auto Configuration" web page:
http://www.schooner.com/~loverso/no-ads/

Sheryl Canter's "Kill Internet Ads with HOSTS and PAC Files" article:
http://www.sherylcanter.com/articles/oreilly_20040330_HostsPac.php

Homer LocalHost web server (prevents visible error messages when you block sites)
http://www.funkytoad.com/

"WebWasher Classic" Internet filter program is free for personal use:
http://www.cyberguard.com/products/webwasher/webwasher_products/classic/index.html

Open Source "Privoxy" is a true proxy that can handle multiple computers:
http://www.privoxy.org/

Pyrenean's "eDexter" local image web server. "DNSKong" personal DNS server:
http://dnskong.com/

Internet Filter's list of over 400,000 known porn web sites:
http://research.vision-options.com/research.php



 

Lost? Look at the site map.

Bad links? Questions? Send me mail.

Google
Yahoo
Ask Jeeves