In the UK? Being told you’re already downloading or have failed the CAPTCHA too many times by file hosting sites for no reason? You might be being IWF’d.

If you’d tried to download a file from either filesonic or fileserve some time between 3am on 15/04/2011 to 7pm on 16/04/20111 you might’ve noticed something kind of odd. You might have got an error similar to one of the following.

An error message from Fileserve

An error message from Filesonic

Your first assumption for the first case, as mine was, might be that you’ve been assigned an IP address previously assigned to someone which has failed the CAPTCHA many times. However, the second claims someone else is downloading at the same time. At first glance, this seems to only leave two possibilities: someone else on your connection is downloading a file (nope) or ISP doing large scale NAT (nope, which is a relief).

After a brief IM with an acquaintance who has clearly done their homework, but wishes to remain anonymous I was informed I’d been IWF’d, something that didn’t sound particularly pleasant and, as it turns out, isn’t.

Wait, who are the IWF? What are they doing exactly?

The IWF are the quango who have taken it upon themselves to filter bits of the Internet in the UK. ISPs then subscribe to a list of blocked domains and individual URLs. Enforcing the blocking of an individual domain is relatively easy: don’t respond to DNS requests if the user happens to be using your nameserver, but more importantly, drop any packets to and from the associated ip. Therefore this can be done at the IP layer.2

Sites which allow user content, including one-click file hosting sites (eg Rapidshare) present a more complex challenge for censors. Clearly since anyone can upload files to them, people can upload the types of files the IWF block. Although these types of files will almost always eventually be removed by the administrators of such sites following a complaint, there will inevitably be a delay between a complaint being filed and an actual take down. In this interim period, the IWF want to prevent the files from being downloaded. However, blocking the whole domain would be too agressive. Therefore, the IWF want to only block individual URLs. Unfortunately, this can only be done at the HTTP layer since URLs exist at this layer, which means its necessary to proxy all traffic through a HTTP proxy. This is done by your ISP routing any packets addressed to, say Fileserve’s IP address, to their own proxy instead of to Fileserve’s servers. Your ISP’s proxy then checks to see if the url is blocked. If it is, several different things may happen depending on your ISP. In some cases the connection is dropped. In others a 404 File Not Found (dishonest) or a 403 Forbidden (honest) is returned. If the URL isn’t blocked, the ISP’s proxy will make the request on your behalf to Fileserve and respond with Fileserve’s response.

Actually it turns out this second technique is fully general, so there’s no need to use the first one, so CleanFeed and co don’t.

How does this relate to the original problem?

Putting lots of people behind a proxy breaks many web services. This is because as far as the service is concerned, it appears as if all of a particular ISP’s users are coming from a few IP address (for example, Virgin Media Broadband have just three proxy servers). Many web services make the (often false, but close enough to true to be workable) assumption that one IP address means one user. Perhaps the most famous case of IWF causing this type of breakage is when they blocked Wikipedia’s Virgin Killer page thereby breaking anonymous edits. In this case, it only takes a relatively small proportion of Fileserve users to fail the CAPTCHA for everyone to be locked out. Even if that weren’t the case, the one file per IP policy of both Fileserve and Filesonic ensure that only a few people from a whole ISP will be able to download at once.

How can I be sure I’ve been IWF’d

(This answer is likely to be more technically in depth than the rest of the post, so feel free to skip it.) There are many different techniques and this is an area in which your mileage is likely to vary significantly given each ISP has a different set up for filtering traffic. Two particular systems are Cleanfeed and WebMinder. In general though there are three families of techniques:

  1. Inspecting the HTTP response headers.
  2. Using traceroute in various ways.
  3. Convincing the website to display what it thinks your IP address is and comparing it to your external IP received from an unblocked site, say whatismyip.com.

The first way is probably easiest. Here’s an example from Virgin Media Broadband:

$ telnet filesonic.com 80
Trying 78.140.176.180...
Connected to filesonic.com.
Escape character is '^]'.
HEAD / HTTP/1.0 

HTTP/1.0 301 Moved Permanently
Server: nginx
Date:
Content-Type: text/html
Content-Length: 178
Location: http://www.filesonic.com/
Age: 0
Via: HTTP/1.1 webcache1-know.server.virginmedia.net (Traffic-Server/5.7.0-59705 [cMs f ])

The important line here is the Via header. The HTTP standard specifies that proxy servers must insert this, however some are naughty and don’t. If your response has a Via header in and you aren’t explicitly using a proxy server, then its presence probably indicated you’re being IWF’d. It is possible that some reverse proxies will insert a Via header. To be sure, you should look at the hostname of the proxy server after the “Via: “. If it contains your ISP’s name, then it seems very likely you’re being IWF’d. Instead of telnet, you may prefer to use the header inspection tool in your web browser (eg in Chrome spanner→tools→developer tools).

Unfortunately, most ISPs aren’t this up front. Therefore it is likely you will have to resort to using traceroute in most cases. One technique with traceroute is to

  1. Do a traceroute.
  2. Check each intermediate node against a list of known IWF proxies. One such list was created during the Wikipedia incident. Even if there isn’t an exact match, it is worth being suspicious in the case that one of the nodes is in the same class C (or possibly, as is the case with Virgin Media Broadband, class B) subnet as a known IWF proxy. It’s probably also wise to look for suspicious names in the hostname such as webblock, proxy, or webcache.

Another technique is to traceroute multiple IP addresses known to not be IWF’d and your suspected IWF’d address. If it’s IWF’d the path should deviate from the normal path sooner, indicating the packet is making its way to your ISP’s proxy, which is within the ISP’s network.

Some ISPs may route the ICMP packets sent by standard traceroute differently from the HTTP traffic we’re concerned about. Therefore it might be desirable to use tcptraceroute with 80 as the destination port for this purpose.

Anyway, I’d be very interested if people want to post their experience with detection of this on different ISPs in the comments.

Edit 18/04/2011: These techniques are actually detecting the presence of a transparent proxy. It is the use of a transparent proxy which causes all these symptoms. Some ISPs use transparent proxies not only for implementing the IWF blacklist, but for other purposes too. One such purpose is caching, since this decreases the ISP’s usage of networks upstream, therefore decreasing their bills. This practice has been prevalent in the past. I was under the impression that it had largely fallen out of favour, but it has been suggested that Virgin Media Broadband use this technique for caching purposes. I have not confirmed this yet.

Edit 18/04/2011: Virgin Media Broadband do not use transparent proxies for caching any more, although they definitely have in the past. Keep in mind that it is possible that your ISP is using a transparent caching proxy though. Although, if they are then to be honest you really should stop using an ISP stuck in the nineties.

Edit 19/04/2011: It looks like there’s a site called censorleaks.com which may be able to tell you whether a site has been IWF’d automatically. I can’t vouch for the site’s accuracy. I believe the name is a misnomer; they are actually probably using a technique similar to one of the one’s outlined here on an ISP known to implement the IWF blacklist rather than having direct access to a copy of the IWF blacklist.

People within the IWF must view illegal content to verify whether it should be taken down. Can’t they be arrested?

Apparently, and this is nth hand information where n≥3, there are six people in the UK who have special permission to view this content and the verification is done in a sealed room. So no.

Edit 17/04/2011: As is so often the case with such hearsay, this is incorrect, or at least not the full story. Rather the police have published a Memorandum of Understanding which is more general purpose. Thanks to mkb for pointing this out. The IWF are mentioned in particular in the memo so it’s possible the original statement related to there being six employees of the IWF who do the actual checking.

What can I do about it?

As a workaround the usual trick of using your own proxy based somewhere outside IWF land works. This includes other countries, but also some ISPs within the UK since IWF blocking doesn’t necessarily apply to all Internet providers within the UK. The IWF have a list of companies who recieve their blocklist. As a rule of the thumb, most home ISPs have it, but some commercial and educational ISPs may not. For example, JANET does not subscribe to the IWF blocklist and therefore the situation will vary from University to University.

You are unlikely to have much luck with public proxies since in this case, again, you are sharing an IP with many other people. Still, given there are so many public proxies, you may have some luck if you manage to find an unpopular one.

Longer term, it is important to keep in mind that ISPs subscribe to the IWF voluntarily. The reason for them even bothering at all is there’s a large lobby including politicians and tabloids3 who conflate being able to access certain urls with the act of abusing a child.  The only way to counter a large lobby is to create your own. This is left as an exercise for the reader.

  1. The actual interval is probably larger, but this was the only time I was monitoring. As of writing it seems like it may still be in effect for filesonic.
  2. Well, maybe not, since this might result in over aggressive blocking for where there isn’t a one to one mapping  between IPs and domains/web sites, so DNS manipulation may be preferable. Of course in that case the website may still be accessible by visiting the IP address but manipulating the Host header. Anyway, I digress.
  3. I include The Observer in this, who may very well have helped pave the way for bringing the IWF into existence with their mud flinging claim that Clive Feather (a director of Demon) “provides paedophiles with access to thousands of photographs of children being sexually abused” by providing unfiltered USENET access.

Being a member of a new Unix group without restarting X

Normally to see the effects of adding a user to a group, all sessions that user is logged into currently must first be closed. This includes X. A quick and dirty workaround for adding yourself to optical in a single terminal follows.

~> groups
wheel games video audio users
~> newgrp optical
~> newgrp users
~> groups
wheel games video audio optical users

Some bookmarklets

I made a couple of bookmarklets to scratch itches in the way of hard to read web pages (Yeah, I know white ninja is meant to be like that. Doesn’t make the text any easier to read on a laptop with masses of glare.)

Readable width

According to some publisher type’s research, 30ems is near most people’s optimal line width for reading. This bookmarklet is used by clicking on some text that’s too wide, then clicking again and again until it works. (UI design award, Powerpoint applause)

Black text

Click some of that grey text that designers like to use to try and get a more ‘anti-aliased look’ according to some article or other (yes, this design is guilty of that too) to get it to #000, the way some figure I can use to pretend I have a fundamental views on the subject intended.

Oh and there is no pre-compression versions. I started tweaking whitespace impaired versions like an idiot. Some excuse for being crappy.

Woo? (Powerpoint applause)

Where’s my vi <textarea>?

Is it this it? Well, if you think opening another tab in gvim, making me switch back and forth and keeping stuff in sync behind my back counts then well, you have low expectations.

How about this?

no particular redistribution rights are granted;

that means you do not have the right to use this on your own web site.

exceptions: Internet Connection, Inc. customers may use this application on any site they like. They may not grant redistribution or other rights to others. they may do this even if they (sadly) become no longer customers.

other rights may become available in the future.

http://src.internetconnection.net/vi/

I can use it if I buy hosting off a provider named “Internet Connection, Inc”? Really, other rights in the future? On some code served from the domain gpl.internetconnection.net? Ambassador, you’re ruining us.

Fine, fine. I’ll get over it. So, yeah, this?

While JS/UIX is open-source (JS-files and HTML embedding must be) it is not
public domain. All rights reserved (c) mass:werk, N. Landsteiner 2003.
You may download the files for private use, but you must not publish, serve or
provide this system in any form without the positive confirmation of the
author. All changes to the source code must be authorized. No warranty of any
kind is granted.

Disclaimer: JS/UIX is provided free of charge and on an “as is” basis, without
warranties of any kind, expressed or implied. Licensors have no liability with
respect to use of the product. The entire risk as to the quality and perfor-
mance of the product is borne by licensee, who assumes the entire cost of any
service and repair. This disclaimer of warranty constitutes an essential part
of this agreement. No use of the product is authorized hereunder except under
this disclaimer.

JS/UIX is NOT a free software (for reasons see the FAQ). If you are looking for
a powerfull but easy to use terminal interface have a look at
“mass:werk termlib.js”

Better put than the last one at least. But what’s the reasoning behind it?

The strict licence is mainly due to the very nature of JS/UIX by now:
Since any malformed code could corrupt the whole system, a greater variety of
distributed copies would render it merly impossible to secure any bugfree
version. Plans are for a totally rebuild version with true multitasking and a
secure userland domain. As this new architecture would allow for third party
extensions, this future version of JS/UIX could well be the start of an open
project under a more permissive licence.

Oh, I see. If you licensed your “Operating System” openly then there would instantly be chaos. Just about everyone and their dogs would fork it. They’d all write masses of really buggy code ‘corrupting the whole system’. Every time anyone wanted a toy operating system they’d always opt for these broken forks rather than the original. Oh, and once again, we can look forward to freedom sometime in the future.

Right, fine. Aha! This one is even encouraging me to download it. Hey, it has command mode, search and replace.

h, j, k, l, h, j, k, l — “Key not supported”! But you have vi *in* your name, handface.gif.

Onwards. Hmm? Wait, even ‘o’ doesn’t work.

Never mind. Ho hum. No commands, no search, the cursor sometimes looks like it’s places where it isn’t, there’s nothing telling you what mode you’re in, no numbers preceding keys. Apart from that pretty good. The best of the freely licensed ones in my opinion.

None of the freely licensed ones are quite good enough for everyday use. That said, I’m amazed nobody’s packaged one them up as a greasemonkey script or the like.

Any I’ve missed?

(Oh, and from a non-textarea point of view vimperator is pretty good.)

Firefox plugin priority order

As far as I can see setting the priority in which plugins handle a piece of media when more than one plugin is capable of doing so in firefox is undocumented and there is no way to do so at about:plugins. This was rather irritating as the gxine starter plugin is nowhere nearly as good as the totem plugin, yet it handles more mime types. So it would make a great backup plugin. So how is priority determined? Simple, the most recent plugins are used in preference to older ones, again this isn’t documented.

So, in *nix all you should have to do is open up a terminal window and type:
cd /usr/lib/firefox/plugins
sudo touch -d 01/01/95 <lower priority plugin>

In my case <lower priority plugin> was gxineplugin.so.

In windows you can download eXpress TimeStamp Toucher, open it and choose an old date and browse to the lower priority plugin which should be in C:/Program Files/Mozilla Firefox/Plugins/ by default.

You can confirm that it has worked by going to about:plugins in firefox and making sure your lower priority plugin is at the furthest down the page of the plugins.

Comment and let me know if this works for you too.

Django on dreamhost problem

***Update***

Looking around I found this post that puts forward what is a much better solution. It suggests renaming django.fcgi to dispatch.fcgi because dreamhost has a policy of not killing things called dispatch.fcgi. I’ve updated the dreamhost wiki page on django to mention this.

What’s below is my old solution:

Primary problem

My mum’s site is set up using Django on Apache running in a shared hosting environment with Dreamhost as laid out in the official documentation and my own write-up. I was having a few problems with it. The problem was an intermittent, unpredictable error about a third of the time which lead to /internal_error.html being displayed. This means that Django wasn’t even starting or was starting and failing very early on because if it was starting it would display the 500.html template instead of /internal_error.html. I took a look at the error.log and sure enough there were errors in the following format for each failed request:
[time and date] [error] [client xxx.xxx.xxx.xxx] (104)Connection reset by peer: FastCGI: comm with server "/home/grimboy/example.com/django.fcgi" aborted: read failed
[time and date] [error] [client xxx.xxx.xxx.xxx] FastCGI: incomplete headers (0 bytes) received from server "/home/grimboy/example.com/django.fcgi"

Solution

After much frustration at these weird, unexplained errors I was just changing random stuff to see if anything increased reliability. In the fastcgi python script that does setup and then calls runfastcgi and lives inside the website root (I call it django.fcgi) I changed:
runfastcgi(method="threaded", daemonize="false")
to
runfastcgi(method="prefork", daemonize="false")
which seemed to mysteriously fix the errors.

Secondary problem

I have no idea about how fastcgi works really. So I don’t know why changing the method from threaded to prefork solved my problem. However, if I’m not completely misinformed fork() starts an additional process and threaded involves using, well, threads. Threads are more lightweight than processes so it should be preferable to use threads.

In conclusion, I am left with two questions:

  • I have no idea why this fixes the original problem and plead ignorance. Anyone care to enlighten me?
  • I don’t know if this problem is unique to me. It could be that the original problem is caused by something else I’m doing wrong. Has anyone else had this problem?

Vim is teh awesombe!

Vim
photo by lilit

I recently started using vim as my primary text editor. I’m no expert with it (yet) I’m just posting with some advice to other people who also want to do the same. But first, I’m going to tell you why vim is good.

  • Fast in terms of loading
  • Fast in terms of usage
  • Customisability
  • Expandability
  • Portability
  • Makes you feel all tingly when you do something in just a few keystrokes

Lots of people reject the idea of heavily keyboard and (human) memory reliant editors such as Emacs or vim because they have a steep learning curve. However, I’ve found vim’s learning curve to be pretty reasonable, and if you code professionally or even as a regular hobby then it’s worth taking the time to learn an editor that will speed you up.

Coffee & Cream
photo by hulksjedi

There’s a nice little vim configuration thing called Cream. From what I understand this comes in two components:

  1. A number of configurations and macros for normal vim
  2. Cream itself (still vim underneath, but even milder, vim on tranquillisers)

I find this quite useful in that it makes normal vim behave a little bit more with its macros. However, Cream itself is too mild, it’s like using a normal text editor.

So, in conclusion, Cream is good. Download it and install it, but use vim configured by cream. Don’t use just cream itself. Well, actually you can – in fact it’s a nice gentle introduction to vim. If you just want a normal text editor then I recommend you use cream itself. I used cream itself for a while in “expert mode” before trying to customise it and failing. One thing that annoyed me in particular while coding python was that “expand tabs” worked for a single document, then claimed it was on but had to be turned off and on before it would start working again. I also wanted to be able to have the exact same configuration everywhere. This includes stuff like SSHing into Dreamhost and doing a hot-fix on a website or messing around on my old computer underneath the deskTM that does odd jobs like backups, svn, trac and an IRC bot. I had trouble doing configuration for Cream itself and I didn’t feel in control. So really, Cream configured or not (g)vim itself was the only option for me.

School children in Sanorgaon
photo by phitar

Anyway, let’s rewind a bit. If you want to learn vim use the tutor. Invoke the tutor by opening a terminal or a dos prompt or whatever and entering vimtutor in it (this assumes vim is in your path). Do the entire tutor, then do it again. This is the only way. You probably won’t find it difficult but you may very well find yourself thinking “this is pointless, weird and archaic” every now and then. I know I did. However, your brain will know when to use various keystrokes magically1. Apart from the tutor I found this charming IRC style tutorial quite useful for a bit of light revision later on. On top of that there’s always the official vim tips repository which is useful for specific pieces of insight. This tip is particularly useful in a general sense as it is a compilation.

A Contraption
photo by tojosan

Ok, so now that you know this little editor that is fast in all senses of the word you’re probably wondering about one of its most loved features, its customisability. Firstly there is the .vimrc (or _vimrc on windows). This is something that I did mostly by example, there are a number of heavily commented .vimrc files out there so it’s fairly easy to just read through them until an option catches your eye. Here are two of my favourites. Obviously if you want to customise something in particular then you can search through help or fall back to teh interwabs.

Here’s mine:
" Kill all the tabs.
set ts=4
set sw=4
set et
set nu
set sr

" use +N/+P to cycle through tabs (the gui kind):
nnoremap :tabnext
nnoremap :tabprev

" autoindenting
set ai
" smartindenting
set si
" a in an indent insets 'shiftwidth' spaces (not tabstop)
set smarttab
" if non-zero, number of spaces to insert for a

set softtabstop=4
" no real wrap during insert
set tw=0

" have the h and l cursor keys wrap between lines (like and do
" by default), and ~ covert case over line breaks
set whichwrap=h,l,~,[,]

" allow to delete line breaks, beyond the start of the current
" insertion, and over indentations:
set backspace=eol,start,indent

" have (and + where it works) change the level of
" indentation:
inoremap

inoremap
" [+V still inserts an actual tab character.]

" map 'F12' to change the pwd of vim to the cwd of the current file
noremap :cd %:p:h

" pyLint the python 'compiler'
autocmd FileType python compiler pylint

" Purdy color scheme
colorscheme inkpot

Disassembled Plug
photo by jm3

Plugins are the next thing. This is pretty obvious, just get some from the official vim plugin repository and shove them in ~/.vim/plugin/ (or C:\Program Files\Vim\vimfiles\plugin\ on windows) and shove colour schemes in ~/.vim/colors/. My favourite plugins at the moment are (keep in mind that some of these are geared toward python/html stuff):

  • Colours Sample Pack – brings variety to the editing experience. Although I am quite settled on inkpot although matrix is good if someone else is in the room.
  • Subversion (svn) Integration Plugin, update with stupid star trek name – Means I don’t have to switch into a terminal as often. (Well I’d probably use :!svn … if I didn’t use this)
  • python.vim – Some python related menu commands. You need to shove the following in your ~/.vimrc au FileType python source ~/.vim/plugin/python.vim
  • runscript.vim – Again, saves me doing a terminal or !python moo.py …
  • Vim Taglist – Turns vim into a source code browser. Endlessly useful for large files.

The Halls of Stanford
photo by akash_k

Finally, I said at the beginning that one of my requirements was to have the same configuration across several machines. What I’ve done is made a vim repository in svn (I use version control for everything) which I check out on all the machines I want to have configured then svn co and ln -s bits of it it to various directories (on windows I just check out bits of the tree directly). I have the plugin and color directories in the repository. I also have a directory called common. When I said I was showing you my .vimrc I was actually showing a file called /common/vimrc in the vim repository. I then source it from all of my .vimrc files. On Ubuntu on my laptop my .vimrc looks like2:

source $VIMRUNTIME/vimrc_example.vim
source $VIMRUNTIME/mswin.vim
behave mswin

source $HOME/.vim/common/common.vim
set guifont=ProFontWindows\ 9

(Yes, I use windows shortcuts on linux)

Sunset de la Pollution
photo by tengis

In conclusion, I’m very happy with vim. It has replace notepad++ on Windows and a different editor every month on Linux. I like controlling everything with the keyboard, a keyboard is ordered. A mouse is chaotic and can easily just wander off into Firefox onto a Wikipedia article about conspiracy theories, or something similarly stupid. Also I’ve never quite been satisfied with using dialogs to do stuff (e.g. open files, change font), gvim provides this option, but I can just as easily do this by typing if I already know what I want3. Also being able to execute terminal/dos commands directly from an editor is unspeakably convenient. If you’re learning vim and want advice or have a question, or if you are a vimmer already and want to correct me on anything I’ve said, then scratch the itch and leave a comment.

  1. No, seriously.
  2. I’m not lying this time.
  3. I think using dialogs are like window shopping. Prompts are like entering a number into the computer thing in Argos4.
  4. Nation specfic references and corporate endorsement all in one. I must be turning into an evil.

A guide to django on dreamhost (and django deployment in general) and my experience so far

Dreamhost is one of the few large non-VPS shared hosting companies that is currently Django compatible. I’ve found that It hasn’t been too hard to set up either. This little guide assumes you’ve read this guide and is mainly just caveats and tips followed by a bit of personal experience on reliability.

MySQL

This is easy. First go to your dreamhost panel. Next, select goodies, then manage MySQL. Now create a database, I’d recommend Django as a name. Then, a hostname and user for the database, the hostname could be something like mysql.yourdomain.com, the user, whatever you want. Now in your settings.py change the DATABASE_* options to something like these:

DATABASE_ENGINE = 'mysql'
DATABASE_NAME = 'django'
DATABASE_USER = 'user'
DATABASE_PASSWORD = 'pass'
DATABASE_HOST = 'mysql.yourdomain.com'
DATABASE_PORT = '3306'

Email

Under Mail > Manage Email in the dreamhost panel you need to set up an email with a mailbox. At first I was making the fatal error of only having one email that just redirects email. Then enter the info in settings.py

EMAIL_HOST = 'mail.yourdomain.com'
EMAIL_HOST_USER = 'addy@yourdomain.com'
EMAIL_HOST_PASSWORD = 'password'

Statistics

Dreamhost normally gives some rather nice statistics on all of your domains at /stats/. You lose this when you get Django to handle the whole domain. Lets take a wee look at the .htaccess’ file:

AddHandler fastcgi-script .fcgi  RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ django.fcgi/$1 [QSA,L]

Currently everything everything except for files it being handled by Django. If you have Django serve for the whole of your domain you lose /stats/ as this isn’t a file. We want to make another exception, especially for /stats/. We want to do the same thing for failed authentication. We can put an extra rule before the Django one like so.

RewriteCond %{REQUEST_URI} ^/stats/(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^/failed_auth.html$
RewriteRule ^.*$ - [L] 

This rule tells Apache to just keep these two URLs as they are and not do anything with Django.

Keeping in sync with the development version

Often a website will have more than one revision. Maybe you’ll find there are some bugs in your website. Maybe you’ll want to add some new features. My recommendation here is to start a subversion repository for your project. Actually doing this is out of the scope of this guide (but there’s an excellent book) but here are some tips.

  • Have a trunk and then periodically copy it to a tag and do a checkout of that tag at the dreamhost end.
  • Keep the media in a directory within the source control and make a symbolic link to it from the actual website directory. Have uploads as a real directory under the domain directory and shove all the uploads there, otherwise you’ll get loads of clutter in your working copy.
  • Tell subversion to ignore settings.py (as well as *.pyc) so that you can set up individual settings for your development and production environments.

Making backups

Dreamhost backup all of your stuff if bad stuff were to go down over there. However, personally, I feel safer having backed up stuff myself as well.

SSH into dreamhost and make a file called db_dump in ~/ and have it contain

mysqldump --opt -u user -ppass -h mysql.yourdomain.com django > ~/db_backups/django.sql

as well as a variation on that line non-django databases you might have. Run chmod 755 db_dump; mkdir db_backups then crontab -e, now enter something like:

MAILTO="your@email.com"

00 09 * * * /home/you/backup_dbs

Note that I’ve chosen 9:00 daily. Dreamhost is at -700. I am at either 000 or +100 depending on day light savings. This means that I am either 7 or 8 hours ahead. This means that while this script is dumping the database it is 4 or 5 in sunny York. Do the same calculations for yourself if you’re in a non-dreamhost timezone.

Do this next bit on a local machine. If you’re using windows start by downloading wget and putting it in your path. Make a file called backup_dh (or backup_dh.bat in windows) in somewhere and have it contain something like

cd dh_backup
wget ftp://user:pass@yourdomain.com/db_backups/*

cd yoursite1_uploads
wget ftp://user:pass@yourdomain.com/yoursite1.com/uploads/*
cd ..
cd yoursite2_uploads
wget ftp://user:pass@yourdomain.com/yoursite2.com/uploads/*
cd ..

Run chmod 755 backup_dh (*nix only), mkdir dh_backup; cd dh_backup; mkdir yoursite1_uploads; mkdir yoursite2_uploads.

Then on *nix crontab -e, enter (or append) something like:

  30 17 *   *   *     /home/you/backup_dh

On windows Start > Control Panel > Scheduled Tasks > Add Scheduled Task > Next > Browse… > look for backup_dh.bat > Select daily > Choose a time – for me it’s 17:30 > Enter your username and password > Finish.

Notice that this downloading is scheduled to happen at 5:30 in sunny York, 1/2 to 1 1/2 hours since dreamhost did its database dump,, calculate a similar thing yourself.

Optimising so dreamhost don’t kill you

Officially you’re safe with dreamhost if you use under 60 cpu minutes (that’s 3600 seconds). Anything over that will be evaluated on a case by case situation. You should take a look at http://yourdomain.com/stats/resources/yourunixname.sa.analysed.0 and see if you are getting too near that. If you are, don’t despair, luckily Django comes with a built cache framework. A few pointers:

  • Make sure whatever you do, to turn on CACHE_MIDDLEWARE_ANONYMOUS_ONLY, otherwise the admin will start playing silly buggers.
  • I would choose local memory caching for any smallish sites. Otherwise use db caching (which, from what I understand should always be just as fast as file-system caching on dreamhost as both files and database is stored on machines other than that running the Apache instance).

Also have a dive into python has an amazing chapter on performance tuning.

Thoughts

So far I’ve been fine, but I really haven’t got that much traffic yet (not even on my mum’s fantastic writing site). However, I’m sure that will all change when I launch my new web application idea thing. So I’ll report back on this later.

Finally…

I’m sure I’ve made some mistakes here so please comment and correct me. Also please comment if you’re having trouble setting up Django on dreamhost.