Revision as of 17:15, 28 February 2013

jail in FreeBSD

Every FreeBSD vps is a jail. A jail is an artificial set of attributes added to a set of processes that bind them together and separate them from other processes on the system.

Do not be confused - there is no virtualization or virtual machine going on here at all. As far as the base FreeBSD system is concerned, it is simply running a whole bunch of processes. There is almost zero overhead in creating a jail around a set of processes. That is to say, if there are 10 jails that each have 10 httpd processes in them, the performance will be almost exactly the same as if there were just a single FreeBSD system running 100 httpds.

You can, however, tell from the output of ps auxw which processes are in a jail and which processes are not. All processes that are inside of a jail have a 'J' in the STAT column of ps auxw. Now, on a production jail server, the underlying system is only running about 20 processes - things like sshd, crond, and syslog. So on a fully loaded jail system with 900 or more processes, only 20 or so would not have a 'J' in the STAT column.

The reason it is nice to know which processes belong to the underlying server, which you can see by running:

ps auxwJ

(we’ve patched ps on some older 4.x servers) or

ps auxw | grep -v J

Is that you can HUP your own sshd or restart cron on the base system - since you know that if it doesn't have a J, it is the process that belongs to the base system. (there may be 30 more syslog processes on the system as a whole, so if it weren't for this, it would be hard to differentiate yours from all the others).

However, the J only tells you that the process is in a jail - not which jail it is in. To tell what jail a process belongs to, you need to find its PID (in top or ps auxw) and then run:

jpid <pid>

or

cat /proc/<pid>/status

Here is an example:

jail1# jpid 4137
java 4137 1 3959 0 5,9 noflags 1103567686,299476 14192,584098 77838,429671 nochan 2530 2530 10005,10
005,10005 www.transelemnt.net

(If you need to fnd a proc, ps wp <pid> will find the path to the executable)

As you can see, the last field in that single line of output is www.transelement.net – so that is what system that process belongs to. You could then:

g www.transelement.net

or

grep “www.transelement.net” /usr/local/jail/rc.d/?????

and you would get:

/usr/local/jail/rc.d/quad1:jail /mnt/data1/69.55.239.59-col00145-DIR www.transelement.net 69.55.239.59 /bin/sh /etc/rc
/usr/local/jail/rc.d/safe1:jail /mnt/data1/69.55.239.59-col00145-DIR www.transelement.net 69.55.239.59 /bin/sh /etc/rc

and you would see the jail command line from both quad1 and safe1 – you would then know the customer number as well, which is col00145.

So, let's take a look at how a system is started. The jail command line consists of the `jail` command and _four_ arguments:

jail (target_directory) hostname IP (command)

So, in the case of www.transelement.net, we see that the target directory is:

/mnt/data1/69.55.239.59-col00145-DIR

the hostname is: www.transelement.net, the IP is 69.55.239.59

and the command is: /bin/sh /etc/rc

Now, that may look like two commands, but it is not - we are interpreting the shell script /etc/rc with /bin/sh - much in the same way that you might run:

perl script.pl

It is important to note that when you see the command:

/bin/sh /etc/rc

that the /bin/sh and the /etc/rc are both inside the target system – so the jail command will fail, and the system will not start if that person does not have /bin/sh or /etc/rc. The actual /bin/sh and /etc/rc on the underlying system are of no use.

/etc/rc.conf

We will start with some basic FreeBSD essentials that you must be aware of.

First off, the absolute most important file on any of our FreeBSD systems is /etc/rc.conf

This is the main startup configuration file for all FreeBSD systems. Not only does it contain startup directives that direct the start process to fire off certain processes, but it also contains the hostname, Ip address, default gateway and all additional IP aliases for the FreeBSD system.

Note that it does not contain the nameservers - those are in /etc/resolv.conf, just like any other UNIX OS.

So, lets take a look at a production machine, jail9 in this case, and its /etc/rc.conf:

hostname="jail9.johncompanies.com"
kern_securelevel_enable="NO"
nfs_reserved_port_only="YES"
sendmail_enable="NO"
sshd_enable="YES"
syslogd_flags="-ss"
portmap_enable="NO"
rand_irqs="9 10 11 13 14"
inetd_enable="YES"
inetd_flags="-Ww -a 10.1.4.109"
defaultrouter="69.55.237.1"
nfs_client_enable="YES"
nfs_client_flags="-n 4"

ifconfig_fxp1="10.1.4.109 netmask 255.255.255.0"
static_routes="t1 office"
route_t1="-net 10.1.5 10.1.4.2"
route_office="-net 10.1.6 10.1.4.2"
ifconfig_fxp0="inet 69.55.237.129 netmask 255.255.255.0"

ifconfig_fxp0_alias0="inet 198.78.65.130 netmask 255.255.255.0"
ifconfig_fxp0_alias1="inet 198.78.65.131 netmask 255.255.255.255"
ifconfig_fxp0_alias2="inet 198.78.65.132 netmask 255.255.255.255"
ifconfig_fxp0_alias3="inet 198.78.65.133 netmask 255.255.255.255"
ifconfig_fxp0_alias4="inet 198.78.65.134 netmask 255.255.255.255"
ifconfig_fxp0_alias5="inet 198.78.65.189 netmask 255.255.255.255"
ifconfig_fxp0_alias6="inet 198.78.65.136 netmask 255.255.255.255"
ifconfig_fxp0_alias7="inet 198.78.66.222 netmask 255.255.255.0"
ifconfig_fxp0_alias8="inet 198.78.65.138 netmask 255.255.255.255"
ifconfig_fxp0_alias9="inet 198.78.65.139 netmask 255.255.255.255"
ifconfig_fxp0_alias10="inet 198.78.65.140 netmask 255.255.255.255"
ifconfig_fxp0_alias11="inet 198.78.65.141 netmask 255.255.255.255"
ifconfig_fxp0_alias12="inet 198.78.65.142 netmask 255.255.255.255"
ifconfig_fxp0_alias13="inet 198.78.65.143 netmask 255.255.255.255"
ifconfig_fxp0_alias14="inet 198.78.65.144 netmask 255.255.255.255"
ifconfig_fxp0_alias15="inet 198.78.65.145 netmask 255.255.255.255"
ifconfig_fxp0_alias16="inet 198.78.65.146 netmask 255.255.255.255"
ifconfig_fxp0_alias17="inet 198.78.65.147 netmask 255.255.255.255"
ifconfig_fxp0_alias18="inet 198.78.65.148 netmask 255.255.255.255"
ifconfig_fxp0_alias19="inet 198.78.65.149 netmask 255.255.255.255"
ifconfig_fxp0_alias20="inet 198.78.65.150 netmask 255.255.255.255"
ifconfig_fxp0_alias21="inet 198.78.65.151 netmask 255.255.255.255"
ifconfig_fxp0_alias22="inet 198.78.65.160 netmask 255.255.255.255"
ifconfig_fxp0_alias23="inet 198.78.65.153 netmask 255.255.255.255"
ifconfig_fxp0_alias24="inet 198.78.65.159 netmask 255.255.255.255"
ifconfig_fxp0_alias25="inet 198.78.65.155 netmask 255.255.255.255"
ifconfig_fxp0_alias26="inet 198.78.65.156 netmask 255.255.255.255"
ifconfig_fxp0_alias27="inet 198.78.65.157 netmask 255.255.255.255"
ifconfig_fxp0_alias28="inet 198.78.65.158 netmask 255.255.255.255"
ifconfig_fxp0_alias29="inet 69.55.237.148 netmask 255.255.255.255"
ifconfig_fxp0_alias30="inet 69.55.237.139 netmask 255.255.255.255"
ifconfig_fxp0_alias31="inet 69.55.237.145 netmask 255.255.255.255"
ifconfig_fxp0_alias32="inet 69.55.237.158 netmask 255.255.255.255"
ifconfig_fxp0_alias33="inet 69.55.237.146 netmask 255.255.255.255"
ifconfig_fxp0_alias34="inet 69.55.237.153 netmask 255.255.255.255"
ifconfig_fxp0_alias35="inet 69.55.237.133 netmask 255.255.255.255"
ifconfig_fxp0_alias36="inet 69.55.237.157 netmask 255.255.255.255"
ifconfig_fxp0_alias37="inet 69.55.238.222 netmask 255.255.255.0"
ifconfig_fxp0_alias38="inet 69.55.237.160 netmask 255.255.255.255"
ifconfig_fxp0_alias39="inet 69.55.239.149 netmask 255.255.255.0"

Ok, let's first look at the first section:

hostname="jail9.johncompanies.com"
kern_securelevel_enable="NO"
nfs_reserved_port_only="YES"
sendmail_enable="NO"
sshd_enable="YES"
syslogd_flags="-ss"
portmap_enable="NO"
rand_irqs="9 10 11 13 14"
inetd_enable="YES"
inetd_flags="-Ww -a 10.1.4.109"
defaultrouter="69.55.237.1"
nfs_client_enable="YES"
nfs_client_flags="-n 4"

we see that the hostname is set. We also see items like:

kern_securelevel_enable="NO"
nfs_reserved_port_only="YES"

These are simply security settings that do not affect anything you will be dealing with.

Lines like this, though:

sendmail_enable="NO"
sshd_enable="YES"
syslogd_flags="-ss"

are marginally important. Obviously we want sshd to start on jail9, and further, we have no need to run sendmail, so that is set to NO.

Also, since there are multiple IPs on the system, we tell syslog that it should not answer remote queries - so we set a directive to pass -ss along to syslog when it is started at boot time.

These two lines make sure that inetd is only running on the private network.

inetd_enable="YES"
inetd_flags="-Ww -a 10.1.4.109"

The only service that inetd is running is the one that lets us collect load average data.

defaultrouter="69.55.237.1" is self-explanatory.

So, a few notes. First, on an existing FreeBSD machine, there is really no reason to ever edit any of the lines in this first section of /etc/rc.conf. Several of the machines running were loaded over 1.5 years ago, and I have not changed the first section of their /etc/rc.conf at all.

Second, the syntax of these items is very important - a missing " or a space between the directive and the = sign or the = sign and the value will cause failure. The following two examples will not work:

sendmail_enable=NO
sendmail_enable = "NO"

However, take a look at the next section, abbreviated for simplicity:

ifconfig_fxp0="inet 69.55.237.129 netmask 255.255.255.0"

ifconfig_fxp0_alias0="inet 198.78.65.130 netmask 255.255.255.0"
ifconfig_fxp0_alias1="inet 198.78.65.131 netmask 255.255.255.255"
ifconfig_fxp0_alias2="inet 198.78.65.132 netmask 255.255.255.255"
ifconfig_fxp0_alias3="inet 198.78.65.133 netmask 255.255.255.255"
ifconfig_fxp0_alias4="inet 198.78.65.134 netmask 255.255.255.255"
ifconfig_fxp0_alias5="inet 198.78.65.189 netmask 255.255.255.255"
ifconfig_fxp0_alias6="inet 198.78.65.136 netmask 255.255.255.255"
ifconfig_fxp0_alias7="inet 198.78.66.222 netmask 255.255.255.0"

First off, we configure the main IP of the system, and that line starts with:

ifconfig_fxp0=

The rest is self-explanatary, however it should be noted that the word "inet" does indeed need to exist in that string.

All additional IPs are added as aliases, starting with #0:

ifconfig_fxp0_alias0="inet 198.78.65.130 netmask 255.255.255.0"

Note that the syntax is identical, except for the addition of the _alias0 at the end of ifconfig_fxp0.

A very important note is that the alias numbers need to be exactly in order, and ascend number by number starting with #0. For example, if you had the following block:

ifconfig_fxp0_alias0="inet 198.78.65.130 netmask 255.255.255.0"
ifconfig_fxp0_alias1="inet 198.78.65.131 netmask 255.255.255.255"
ifconfig_fxp0_alias2="inet 198.78.65.132 netmask 255.255.255.255"
ifconfig_fxp0_alias5="inet 198.78.65.133 netmask 255.255.255.255"
ifconfig_fxp0_alias6="inet 198.78.65.134 netmask 255.255.255.255"
ifconfig_fxp0_alias7="inet 198.78.65.189 netmask 255.255.255.255"
ifconfig_fxp0_alias8="inet 198.78.65.136 netmask 255.255.255.255"
ifconfig_fxp0_alias9="inet 198.78.66.222 netmask 255.255.255.0"

See how it skips from alias1 to alias5 ? If this were in /etc/rc.conf, only alias 0 and 1 would be configured - the rest would be ignored.

Finally, note that the first address to be configured from a subnet receives its actual netmask - in this case, 255.255.255.0. However, all additional IPs configured in that subnet receive a subnet mask of 255.255.255.255. In the abbreviated example above, note how line #8 has a 255.255.255.0 subnet mask ? That is because that ifconfig line, alias #9, is the first time an IP from 198.78.66.xxx has been used - all the others were from 198.78.65.xxx.

Ports

This third lesson deals with the FreeBSD ports tree, which is software packaging/installation mechanism that is used in FreeBSD. You may remember that, in the few times we have installed FreeBSD over the phone, one of the components we always install is "Ports" - this is the ports tree.

In general, across all unix variations, the method for installing new software is always about the same - you download a source tarball and unzip/untar it, run ./configure, make, make install. Sometimes it is a bit different, but that's the general procedure.

As you know, there are shortcuts in every operating system to compiling and building a package. For instance, in solaris, you can get pre-packaged binaries that are already compiled and install them with the `pkgadd` program. Or, even more popular, in Linux you can use the `rpm` command to install RPM files that are also prepackaged, precompiled binaries for linux. Another popular method is the Debian method which uses .deb files, I believe. The important thing about all of these shortcut methods is that they are a manner of packaging up _precompiled_ binaries - you don't actually compile them when you install them. pkgadd, rpm, etc., are all just ways of copying and placing files (and chmodding them, etc.)

The ports tree is different.

First, let's take a look at the ports tree and see why it is called a "tree". Run this command:

cd /usr/ports

then do an `ls`. You will see a fairly large list of directories, and each directory is obviously a category of software (sysutil, security, ukranian, cad, ftp). Now run this command:

cd /usr/ports/sysutils

Now run `ls` again - you will see that in the sysutils directory there are all sorts of ... system utilities. Who knows what they all are or what they all do, however you will see some familiar ones ... rdate, logrotate, httplog, etc. Now run this command:

cd /usr/ports/sysutils/rdate

Now run `ls` again. This is the ports directory for the rdate application. As you can see there is a distinfo, pkg-descr and pkg-comment file - if you read those they will tell you about this distribution, what rdate is and what it does. There is also a "work" directory, and most importantly a Makefile. But, please note, there is no package file and no source code - the package is not here - so what is all this for and what does it do ?

Run these commands: (make sure name resolution works on your system first)

cd /usr/ports/sysutils/rdate
make install clean

Now, watch what happens - suddenly you are fetching files over ftp ... those files are being untarred, and compilation is occurring ... rdate is rather small, so in a short time it should be all finished. Now run the command `rehash` to get your shell to take note of the new `rdate` command that is now in your path (alternatively you could log out and log back in).

Now run rdate -s time.nist.gov

rdate has now been installed.

So, what happened ? Well, open up the Makefile for rdate in a text editor and note the line that says:

MASTER_SITES=   http://www3.cons.org/freebsd-distfiles/ \
                ${MASTER_SITE_LOCAL}
MASTER_SITE_SUBDIR=     cracauer

So, when you ran make install clean, the Makefile had the information as to where to go to _get_ the source tarball. It downloaded the source into /usr/ports/distfiles (run ls /usr/ports/distfiles and you'll see it there) then it untarred it, and configured and installed it, just as if you were compiling the package by hand. The clean target for make causes the work dir to be removed after compilation so that no extra disk space is wasted.

So, the big difference between the FreeBSD ports tree and the other methods on other unixes (rmp, pkgadd, deb) is that you don't have a precompiled binary - you install from source, even though you are using the ports tree.

The other big difference is that it is arranged as a tree, as you have seen. When you want to install `wget`, you:

cd /usr/ports/ftp/wget
make install clean

and when you want to install BitchX you:

cd /usr/ports/irc/bitchx
make install clean

And every time the source tarballs will end up in /usr/ports/distfiles

It should be noted that after you have successfully installed a port, you can just delete the tarballs out of /usr/ports/distfiles if you want to save the space....or you can leave them in case you ever install the port again, since they are already there, FreeBSD won't bother downloading them again.

So that's the primary explanation. Most people that know FreeBSD know all of the above (although some people don't know to run `rehash` after installing to get the new binaries into your path - they just log out and log back in). It is interesting to note that regardless of ones own preference concerning which OS to use, almost _everyone_ (even hardcore linux zealots) concede that the FreeBSD ports tree is the nicest, easiest, and most configurable way to do package management. It is considered one of the main advantages of using FreeBSD, from a user standpoint.

So let's move on to some advanced topics concerning the ports tree.

First, make install clean can fail. Rdate was a very simple program to install, but sometimes you install something very complicated, and it depends on other packages, and those other packages depend on even other packages. Now the nice thing is, the ports tree manages all of this – if you install a package with a lot of dependencies, it installs all of them automatically - and installs all of their dependencies automatically as well. All you have to do is go to the single port you want and run make install clean and it does them all in the right order, etc. It's very slick.

But, sometimes a port (like rdate) has only one master site listed in the Makefile (a lot of ports have many sites listed) and if that site is down, it can't download the source tarball, and obviously it fails. Another common problem is that you have a newly installed FreeBSD machine and you forget to populate /etc/resolv.conf so you have no name lookups – which means when you make install and it tries to connect to ftp.example.com it fails. Further, you could have an old ports tree from an old installation that asks for some_software-1.2.3.tar.gz, and the ftp site no longer has that old file there anymore - they only have 2.3.4...so again it will fail.

Regardless of where it fails though (maybe the main package failed, or maybe a dependency of a dependency of a dependency fails) you clean things up simply by going to the ports directory that you initially ran make install in and just run make clean. Whatever error messages you saw when it bombed out will tell you what port it was on that it bombed out on (because again, it might bomb out not on the port you are installing, but on some dependency of the port you installed) and it will give a clue as to why it bombed out. So use that info, fix the problems, run `make clean` and then run `make install` again.

Second, you can populate /usr/ports/distfiles by hand. So let's say you make install clean and it bombs out because it can't download a file (maybe the ftp site is down). If you can find that exact same file somewhere else on the net, you can download it into /usr/ports/distfiles and run make install again - it will find it and just use it.

Third, installing a port over itself - let's say you install a port, and then either want to reinstall it later, or reinstall it with a newer ports tree (more on that later) all you have to do is go into the ports dir for that app (for instance /usr/ports/sysutils/rdate) and run make install again. Although there may be exceptions to this, and if there are major differences in version A of an app and version B, maybe it won't always work. But in almost all cases, you can just go back to the port directory for that app and run make install again.

Fourth, if you want to look at what ports are installed on a particular FreeBSD system, just run the command pkg_info and you will see a list of all ports installed on the system. However, this info is extracted from the /var/db/pkg, so if you delete /var/db/pkg you won't get any info.

Fifth, you can change the way that port compiles and/or installs by editing the Makefile in the directory of that port. This is fairly rare that you need to do this, and in most cases, like you saw with `rdate` there is not even much to the Makefile itself. However many times there are arguments you can change in the Makefile to alter the way that the port compiles and eventually installs.

However, more commonly, you alter the way that a port installs by adding additional command line arguments to the normal make install clean. For instance:

Edit the Makefile in /usr/ports/net/cvsup and note this line:

.ifdef WITHOUT_X11

This tells us that, if we wanted to, we could install this port with the command line:

make WITHOUT_X11 install

Why would we do this ? Well, cvsup is a very common utility that allows you to sync or install files and filesystems from servers called cvs repositories (cvs is a software version control system, and it is the very popular, de facto standard for all large open source software projects (including the linux kernel and FreeBSD/NetBSD/OpenBSD)). However, for some dumb reason when you go to /usr/ports/net/cvsup and run make install the port also installs all these silly X utils for running cvsup ... and of course, all those X utils depend on a lot of X binaries and libraries. Many people are shocked when they try to install the simple cvsup utility and watch the port install 100 megs of X dependencies for some lame helper apps they will never even use.

Thus, most folks install cvsup with make WITHOUT_X11 install clean. This is just a small example of why you might add additional arguments on the make install clean command line to alter the way a port either compiles or installs.

Additionally, if you know you’ll never have X11 installed, you can put

WITHOUT_X11=yes

into /etc/make.conf and that will be defined for any port that you build.

Sixth, you can actually patch a software package from within the ports tree. As you remember, we used the `patch` command to patch the source code for the FreeBSD kernel. Usually, as you saw, we always have a "patch file" with the "unified diff" syntax that we run the patch command against. Well, in the world of compiling your own packages from source tarballs, it is very common to patch the source for a software package just like we patched the code for the FreeBSD kernel.

Now, since the ports tree actually downloads the real source tarballs and does an actual compilation of the source code (unlike rpm or pkgadd which just copies in precompiled binaries) it stands to reason that if we need to patch a packages source, we could do it even though we are installing from the ports tree ... and we can. However, this is somewhat rare. The reason it is rare is that if you are meticulos enough to be patching a particular version source for a package, you are probably just installing it by hand anyway. I have only ever patched a port once in my life. But just for your information, you just take the patch file and place it into:

/usr/ports/(category)/(port)/files

and just run make install clean as usual. The port will see that there is a patch file in the files subdirectory and attempt to apply it automatically. Easy.

Now, the last advanced topic to understand about the ports tree is _updating_ the ports tree. Let's say you install FreeBSD 4.4-RELEASE and you don't ever upgrade it, but everything is ok and you don't feel the need to upgrade. However, a new version of XYZ app comes out and for some reason you want to just install it from the ports tree .... but your ports tree that got installed with 4.4-RELEASE is an older one, and only knows to download and compile the older version of XYZ app. What to do ?

Well, since you do not want to upgrade or reinstall your OS, what you need to do is update your ports tree. I am not going to provide the step by step for that here, since I have a URL for the process that works very well, but here are some pointers for the process:

1. It involves using the cvsup command we mentioned earlier. You should install this, and use the WITHOUT_X11 switch most likely.

2. you can update only certain categories of the ports tree – for instance, there is little reason to update the entire ports tree and get all the "ukranian" and "math" and "russian" ports - you can instruct cvsup to just update a particular category like: /usr/ports/sysutils

3. If your version of FreeBSD is old (like 3.1) and you update to a new ports tree from a current version like 4.7, obviosuly some apps you install from the new 4.7 might be confused. However, the beauty of the ports tree is that that should happen only very rarely - remember that when you install the port a full configure and make (compilation) occurs, so the compilation process should compile for the version you are running, and all should be well.

The instructions for updating your ports tree are here:

http://www.freebsddiary.org/ports.php

Please read it and pick some ports category you will never use like /usr/ports/chinese and update that with cvsup.

Also, if you are ever wondering what the vintage of your particular ports tree is, you can:

cat /var/db/port.mkversion
20020814

and see the date. If port.mkversion is not existent or is an older date than some ports expect it to be, they will not install. You can either solve the problem that caused it, or you can just edit that file and put in some later date and things will work again.

Manually adding and subtracting IPs from a machine

UPDATE: there are easy scripts to do all this now, see ipadd and ipdel

So, let's say you want to add an IP to an existing FreeBSD system. Presumably there are other IPs already bound to the network card, so any additional ones you configure would be aliases.

You would run a command like this:

ifconfig em0 alias 69.55.228.10 netmask 255.255.255.0

ALWAYS triple-check these commands !!!! If you run this command without the "alias" as the third word, you will replace the existing IP on that system with this new one, and since sshd is only bound to the original IP, your session will end, and the machine will effectively be totally offline. There is no way to get it back - you would have to call the datacenter and have them powercycle it. Also, take note of the device- in this example we used em0, but perhaps on the server you'll have bge0 or possibly the nic's are reversed and the public nic is bge1. Always run ifconfig once first to see what's going on.

Now, as is true with /etc/rc.conf, if the ip you were adding was part of a subnet that already had some IPs configured on this machine, you would alter the netmask to be 255.255.255.255:

ifconfig em0 alias 69.55.228.10 netmask 255.255.255.255

Now, if you wanted to remove an IP from the system, you would first run:

ifconfig -a

to see all the IPs currently configured, and then you would run:

ifconfig em0 -alias 69.55.228.10 netmask 255.255.255.255

notice the subtle addition of the minus sign in front of the word alias. The Ip and subnet mask in this line should match what you see in the output of `ifconfig -a` - so if it is 255.255.255.255 already, you should remove it as such, and if it is 255.255.255.0, then you should remove it like that.

After adding a new IP, you need to add an alias line to /etc/rc.conf – you can just add a new alias line at the bottom of the list and make sure the alias # is the next number in the list.

However, if you remove an IP, then you need to remove a line from /etc/rc.conf - and that means that you need to watch for two things:

a) the alias numbers _must_ be sequential, in order, and starting with 0 - so if you remove a line from the middle, you will need to renumber every line after it.

b) if you remove an IP that was the first IP of a subnet to be used on this system (which would have a subnet mask of 255.255.255.0, then you need to find the next IP of that subnet and change that netmask to 255.255.255.0, because now it is the first one in that subnet.

Keep in mind that when the system is up and running, and you are adding and subtracting IPs, /etc/rc.conf plays no role - you simply are editing it and reflecting the new changes for the _next time the system boots_.

So, if you forget to add or subract the IP from /etc/rc.conf, or if you get the alias numbers out of order, you will not know until the next reboot.

REMINDER: always triple-check ifconfig commands you are running - if you forget that alias word, the main IP of the system will go away, you will have no access to that system, and I think all the other IPs on the system stop functioning as well. You must immediately reboot the system.

So, what happens if you do mess up /etc/rc.conf, don't notice it, and then reboot sometime later with it ?

Well, it depends - if it contains an IP that should have been removed, then perhaps there is no problem - it just has one extra IP. However if you moved that IP to a different server, then there will be an IP conflict. You may not notice this yourself - every other IP, and the machine itself will work fine ... but know that if you get emails telling you that sometimes their site responds and sometimes it doesn't, or sometimes they get their site and sometimes they get another site, then this should clue you in that perhaps there is an IP conflict.

But, if the opposite occurs - if there are IPs that should be in /etc/rc.conf and are not _OR_ if out of order alias numbers cause some portion of the IPs that should have been configured to not be, then you have a problem.

Two things:

a) any time a freebsd system comes up, it would be wise to run `ifconfig -a` and compare it to what you see in /etc/rc.conf - just as a sanity check - this is before you run any quad or safe scripts. If there is indeed a mismatch, the fastest thing to do is simply fix what is wrong in /etc/rc.conf and reboot the system.

b) if there are IPs missing and you do not notice until some or all of the systems are up, then you need to manually add the IPs that are missing.

There is strong indication that you can add an IP to a jail _after_ the jail has started. Meaning that, the jail command for that system has already run, it is up and running, but simply cannot be contacted because the IP it is supposed to be running under does not exist.

I do not recommend doing this though ... although sshd and sendmail seem to function immediately after configuring the IP that that jail should have, it is not clear that other server daemons do.

So the best thing to do is just do that /etc/rc.conf and `ifconfig -a` comparison after the system boots, and before you run any quad/safe scripts.

MD/VN Devices and Mounting Customer Filesystems

Ok, so we have discussed how to dump and restore filesystems - a fairly easy concept. But, what to do with these dumped filesystems, once they are dumped and exist in the form of a file ?

We use a special device- md or memory disk- to attach these dump files to a md device- this creates a md (or vn)-backed filesystem, also known as a file-backed filesystem. Once configured, we can use this like we would a physical disk- we can fsck it, mount it, etc.

On older 4.x systems we used a similar technology called vn- the difference being md devices are created on the fly (as needed) whereas vn* devices are pre-created when we create the new system.

To create these from scratch, we start with an empty file. The process of configuring the file will assign it a particular size- the size of a customer's jail. That means, a 20 GB md-backed filesystem will actually be a 20 GB regular file inside which a full filesystem exists. That regular file is configured with mdconfig (or vnconfig), and then newfs'd and then mounted, like any other filesystem.

When we create a new md/vn we start off with a simple, empty file. After we md/vnconfig the file with a particular size (say 20 GB) this creates a truncated file- allowing us to oversell disk space. See below for how truncation works.

Here is the procedure for creating a new mdfile-backed filesystem:

Make & configure the mdfile:

touch /path/to/mdfile 
mdconfig -a -t vnode -s <GB>g -f /path/to/mdfile -u <#>

Ex:

touch /mnt/data2/69.55.232.68-col02144
mdconfig -a -t vnode -s 20g -f /mnt/data2/69.55.232.68-col02144 -u 11

-s specifies how big to make the filesystem/file
-f specifies which file to use for the mdfile (just make sure the is not in use, and if it doesn't exist- touch it first)
-u specifies which md device number you may use. Any unused md # is ok, you can see what's already in use (and therefore what NOT to use) by running js or mdconfig -l

Format the filesystem:

newfs /dev/md<#>

Ex: newfs /dev/md11

Create a mountpoint:

mkdir /path/to/dir

Ex: mkdir /mnt/data2/69.55.232.68-col02144-DIR

Mount it:

mount /dev/md<#> /path/to/dir

Ex: mount /dev/md11 /mnt/data2/69.55.232.68-col02144-DIR

Confirm it's properly mounted at the right mount point and has the correct size:

cd /path/to/dir
df -h .

Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/md<#>   <size>    0      <size>    0%    /path/to/dir

Ex:

cd /mnt/data2/69.55.232.68-col02144-DIR
    df -h .

Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/md11     20G       0M     20G     0%    /mnt/data2/69.55.232.68-col02144-DIR

Look to make sure the mount point is right (/mnt/data2/69.55.232.68-col02144-DIR) the device is right (/dev/md11) and the size is right (20G).

Here is the procedure for creating a new vnfile-backed filesystem:

Configure the vnfile:

vnconfig –T –S <#>g -s labels -c /dev/vn<#> /path/to/vnfile

Ex: vnconfig –T –S 20g -s labels -c /dev/vn12 /mnt/data1/69.55.230.44-col00137

-S specifies how big to make the filesystem/file
-c specifies which vn device number you may use. Any unused vn # is ok, you can see what's already in use (and therefore what NOT to use) by running js

Create a partition on the device:

disklabel -r -w vn<#> auto

Ex: disklabel -r -w vn12 auto

Format the filesystem:

newfs /dev/vn<#>c

Ex: newfs /dev/vn12c

Create a mountpoint:

mkdir /path/to/dir

Ex: mkdir /mnt/data1/69.55.230.44-col00137-DIR

Mount it:

mount /dev/vn<#>c /path/to/dir

Ex: mount /dev/vn11c /mnt/data1/69.55.230.44-col00137-DIR

Confirm it's properly mounted at the right mount point and has the correct size:

cd /path/to/dir
df -h .

Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/md<#>c  <size>    0      <size>    0%   /path/to/dir

Ex:

cd /mnt/data1/69.55.230.44-col00137-DIR
     df -h .

Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/vn11c     20G      0M     20G     0%    /mnt/data1/69.55.230.44-col00137-DIR

Look to make sure the mount point is right (/mnt/data1/69.55.230.44-col00137-DIR) the device is right (/dev/vn12c) and the size is right (20G).

At this point, there is now a 20 GB filesystem mounted on the system, with a mount point of "/path/to/dir". If you cd to /path/to/dir and place 1 GB of files there, the disk space used on the mount point on which the md or vnfile are located (i.e. /mnt/data1) and the md/vn device mount point, will go up by 1 GB ... but the size of the md/vn file will remain unchanged at 20 GB.

To illustrate how this works, let's see how we determine whether a file is truncated and to what extent. Compare an (existing customers) md/vn file size reported by ls with what du reports:

jail9 /mnt/data1# ll -h 69.55.232.70-col01963
-rw-r--r--  1 root  wheel    30G Apr  6 14:45 69.55.232.70-col01963

The md file is 30 GB

jail9 /mnt/data1# du -sh 69.55.232.70-col01963
4.7G    69.55.232.70-col01963

However, since it was created as a truncated file, it's only occuping 4.7 GB of disk space- because the customer has written (up-to*) 4.7 GB of data in their filesystem, at some point.

In other words, if you create a md/vnconfig 100 GB device & file, then immediately run a df of the underlying filesystem (i.e. /mnt/data1) you will not see any change in the space free space available. If after you mount that md/vnfile and write 100 GB into the mounted device, 100 GB will disappear from the underlying filesystem (i.e. /mnt/data1) and from the md/vn filesystem.

jail9 /mnt/data1# df -h 69.55.232.70-col01963
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/md24      30G    1.6G     27G     9%    /mnt/data1/69.55.232.70-col01963

*They are currently only using 1.6 GB in their filesystem. Why doesn't du report 1.6 GB as well? This is because once the mdfile is written to ("expanded") even if you delete files, the md/vnfile doesn't contract down again- usage always increases. So for instance, someone could use all their disk space, but then delete all files, the result will be their md/vnfile will expand to occupy the full size of their filesystem (30 GB in this example)...so, to reclaim the unused space you'd need to create a new md/vn file and dump their filesystem into that.

A few details:

1. Please note the specific syntax of the commands we used to make the filesystem - specifically, how we refer to /dev/vn12. In the vnconfig command, we refer to "/dev/vn12", however in the disklabel command we refer to just "vn12", and then in the newfs and mount commands, we refer to "/dev/vn12c". These must always be just like this. You cannot run the disklabel command with "/dev/vn12", and you cannot run the vnconfig command with "/dev/vn12c" - the syntax needs to be exactly as it is shown above. In general, avoid using md0-2 and vn0-2 since we may use those for file-backed swap on some systems.

2. The disklabel and newfs commands are, of course, only needed when first creating the new filesystem - if you were to unmount this new filesystem, and use mdconfig -u <#> or vnconfig -u /dev/vn<#> to disassociate the md/vnfile from the md/vn device, you would then be left with just a plain old file sitting on your disk.

The process of unmounting and unconfig'ing md/vn devices, and then re-configing and remounting is as follows:

For md:

Take it down:

umount /path/to/dir

or

umount /dev/md<#>

(note, your pwd must not be inside the mount point) Ex: umount /mnt/data2/69.55.232.68-col02144-DIR or

   umount /dev/md11

mdconfig -d -u <#>

Ex: mdconfig -d -u 11

Then to bring it back up:

mdconfig -a -t vnode -f /path/to/mdfile -u <#>

Ex: mdconfig -a -t vnode -f /mnt/data2/69.55.232.68-col02144 -u 11

mount /dev/md<#> /path/to/dir

Ex: mount /dev/md11 /mnt/data2/69.55.232.68-col02144-DIR

So, first we umount the filesystem, like we would any other filesystem. Then we use mdconfig -d -u # to disassociate it from the md device.

Now, when we reconfigure it against md11, note that the mdconfig command is much simpler than the original mdconfig command we used to create this filesystem - you do not need the '-s #GB' argument.

Then we mount the filesystem again.

For vn:

Take it down:

umount /path/to/dir

or

umount /dev/vn<#>c

(note, your pwd must not be inside the mount point) Ex: umount /mnt/data1/69.55.230.44-col00137-DIR or

   umount /dev/vn12c

vnconfig -u /dev/vn<#>

Ex: vnconfig -u /dev/12

Then to bring it back up:

vnconfig /dev/vn<#> /path/to/vnfile

Ex: vnconfig /dev/vn12 /mnt/data1/69.55.230.44-col00137

mount /dev/vn<#>c /path/to/dir

Ex: mount /dev/vn12c /mnt/data1/69.55.230.44-col00137-DIR

So, first we umount the filesystem, like we would any other filesystem. Then we use vnconfig -u /dev/vn# to disassociate it from the vn device.

Now, when we reconfigure it against vn12, note that the vnconfig command is much simpler than the original vnconfig command we used to create this filesystem - you do not need the `-s labels -c` arguments.

Then we mount the filesystem again.

3. A particular md/vnfile is not married to the device it was created with. In the above example we created the device on /dev/md11 and /dev/vn12, and then mounted it with that - but, if we were to unmount it, unconfig it from /dev/md11 or /dev/vn12 (at this point we have nothing but a plain old file), we could then mdconfig or vnconfig it against md25 or vn14 and mount /dev/md25 or /dev/vn14c instead - it would work just fine.

The only caveats, of course, is that /dev/md25 must not already be in use, and /dev/vn14 must exist, and vn14 must not already be in use by some other filesystem.

4. A particular md/vnfile is also not wedded to its machine, ot is mount point. If you umount a md/vn-backed filesystem, and then unconfig it, you can then copy that file to some other system, md/vnconfig it to a different md/vn device number, and mount it on some totally differently named mount point. While technically possible, this is not a good idea. When using truncated files, the moment you move that file to a different filesystem (even on the same machine: /mnt/data1 -> /mnt/data2), as the file is being transferred off the source filesystem, empty/unused/truncated parts of the file will be filled with 0's by the sending process. So while you started off with a 20 GB file (as reported by ls) and 1 GB usage (as reported by du) after the move it will be reported as 20 GB by both ls and du. Therefore, any time we move a file off a filesystem we use dump and restore. See dumpremoterestore and dump and restore.

Note: if you un-mn/vnconfig before an unmount, re-md/vnconfig then unmount then un-md/vnconfig

Dump and Restore

One of the basic tasks that we need to perform in this organization is creating filesystem images and accurately cloning them across servers.

Since each server instance running on the machines has a full FreeBSD filesystem, it's important to understand the correct ways of cloning a filesystem and extracting the filesystem into multiple places.

The natural assumption is to use tar(1). However, tar(1) is not appropriate for archiving special files and for preserving all ownership and special attributes of certain files. This is important to keep in mind, because when we create a file system image, we recreate the entire OS. Run this command to see what type of "special" files are sometimes not treated correctly with tar(1):

`ls -asl /dev | more`

Please note the major/minor device numbers that these files have.

It would be difficult to describe all the shortcomings of tar(1), but they exist, and must be avoided by using different tools. An almost monthly question that appears on the FreeBSD mailing lists is "what should I use to safely backup this kind of filesystem or these kind of files and be sure that I am faithfully preserving everything." This is where dump(8) and restore(8) come in. A FreeBSD core member was once quoted on one of the mailing lists saying "Use dump or you will lose".

It is actually unfortunate that we cannot use tar(1) because it is much simpler, and more versatile. Further, many of the files that we are guaranteeing accuracy on by using dump(8) and restore(8) are probably not that important in the environment we run he server instances in. We provide a full /dev to our customers, but really they need very little of it. However, it's not for me to say why they are getting a FreeBSD server, and so I try to reproduce a real one as faithfully as possible.

Basically the dump(8) command dumps a filesystem to a regular file, and the restore(8) command restores the data from that file into a path you specify at some later date. dump(8) is non-versatile in the sense that you can only dump filesystems. So, you can dump /var (if you have /var as a seperate partition), but you cannot dump /var/db by itself, since presumably that is not on its own partition. If you want to dump /var/db, you need to dump all of /var. And if /var was not its own partition, you would have to dump all of / to get it. This is unfortunate, but it is how dump(8) works and there is no getting around it.

restore(8) is more flexible in the sense that you can restore a dump-file into any location. Just cd to where you want the dump-file expanded and expand it. Easy.

Let's give it a shot:

First, I run the command `df -k` to see what filesystems I have on my server:

www# df -k
Filesystem    1K-blocks     Used    Avail Capacity  Mounted on
/dev/mlxd0s1a   1016303   862925    72074    92%    /
/dev/mlxd0s1f   7030220  5488394   979409    85%    /mnt/data1
/dev/mlxd1s1e  17369623 11692971  4287083    73%    /mnt/data2
/dev/mlxd0s1e    508143   181167   286325    39%    /var
procfs                4        4        0   100%    /proc
www#

I'll choose the /var partition for this example since it is small and will go quickly. The syntax for dumping a filesystem to a file is:

`dump -0a -f /mnt/data2/dump-file /dev/mlxd0s1e`

So, we are dumping to a file /mnt/data2/dump-file that does not already exist, and we are dumping from the device /dev/mlxd0s1e which, as you can see from the df(1) output is the device that /var is mounted on. It is worth repeating that you can only dump an entire filesystem, and that filesystem is referred to by the device it is mounted on.

www# dump -0a -f /mnt/data2/dump-file /dev/mlxd0s1e
DUMP: Date of this level 0 dump: Tue Aug 20 00:21:25 2002
DUMP: Date of last level 0 dump: the epoch
DUMP: Dumping /dev/mlxd0s1e (/var) to /mnt/data2/dump-file
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 181700 tape blocks.
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: DUMP: 181684 tape blocks on 1 volume
DUMP: finished in 52 seconds, throughput 3493 KBytes/sec
DUMP: Closing /mnt/data2/dump-file
DUMP: DUMP IS DONE
www#

So now we have a backup of /var in a dump-file named /mntdata2/dump-file. We can just save that until we need it - it is ok to compress it or tar it up or even rename it - it is simply a regular file.

Now we wish to restore the contents of dump-file using the restore(8) command. Remember that we can restore the file anywhere – we can restore it on a partition like we dumped from (such as /var) or we can restore it in a deeply nested directory such as /usr/local/etc/rc.d. It will restore correctly regardless of where you restore it.

www#
www# mkdir /mnt/data2/test
www# mkdir /mnt/data2/test/test
www# cd /mnt/data2/test/test
www# restore -x -f /mnt/data2/dump-file

You have not read any tapes yet. Unless you know which volume your file(s) are on you should start with the last   volume and work towards the first.
Specify next volume #: 1
set owner/mode for '.'? [yn] y
www#

These are probably the only arguments you will ever use for restore(8) unless you place it in a script which we will discuss in a bit. You'll note that two pieces of user input are needed in this process, as opposed to dump(8) which did not ask any questions. First, we are asked to "Specify next volume #:", to which we reply "1" (and we will almost certainly always simply reply with "1", since we are not doing multi volume tape sets or anything complicated lke that). Second, many seconds later after the dump is complete, we are asked if we want to "set owner/mode for '.'?" and the answer is Yes, or more specifically, the y key followed by a carriage return. The whole point of using these tools is to preserve attributes, so it is natural that we would say yes to this question.

Now that you’ve learned how to do it manually, you’ll be happy to know there’s a script that automatically enters the 1 and y for you: dumprestore <path-to-dump-file>

however…this sometimes fails to set proper permissions. So either followup or don’t use it.

Some additional information that you should know:

First, dump(8) and restore(8) are not just FreeBSD commands. They are very old, historical unix commands. They have the ability to interact with complex systems of tapes/volumes and are the core of a lot of enterprise backup schemes. You would be surprised how many unix admins use these tools instead of big fancy backup products.

Second, whereas the only syntax we will probably ever use with dump(8) is this:

dump -0a -f </target/file> </device/name>

there is another syntax that we will sometimes use with restore(8). In our example, we restored in an interactive environment - that is, we are logged onto a server and running restore(8) in a shell. We use this syntax:

restore -x -f /some/dump-file

and we then answer the two questions that require an interactive response. However, sometimes we will want to run restore(8) out of a script, which presents a problem because we cannot answer the questions in a script when it runs. Therefore, if we restore a dump-file as part of a shell script, we use this syntax instead:

`restore -rf /some/dump-file`

This is just slightly different - it performs the restore without needing any interaction. However, it also leaves behind a file called `restoresymtable` in the directory where you restore the dump-file. This file can be safely deleted.

One other consideration that is worth noting is that dump(8) does not save mount points of other partitions. If you dump / and /var is a separate partition, when you restore the dump-file, you will not have a /var directory in the directory where you restore the dump-file.

So now you should try these things out on your own. Install FreeBSD on a system and dump the entire / filesystem into a file. Strange as it may seem, you can actually dump / into /. For instance, if /data is not its own partition and is simply a directory under /, you can dump / into /data/dump-file.

After dumping / into a dump-file, dump /var into a file called dump-file-var. Now create any old directory and restore the dump-file. Then create a /var directory inside the target where you restored your dump-file, cd to it, and restore the dump-file-var file.

Presuming you had no other meaningful partitions besides /var that were on their own device, you now have a clone of your entire system inside /data (or wherever you restored things).

ipfw (firewalling)

ipfw is a userland command (/sbin/ipfw) that is, in a sense, the FreeBSD equivalent of the linux `iptables` command. Although the ipfw command is present in all FreeBSD installations, it also requires kernel support.

You can enable ipfw by adding this line to your kernel configuration line:

options IPFIREWALL

and building and installing that new kernel. It should be noted, however, that there are two other options that should probably be added to your kernel configuration file as well:

options IPFIREWALL_VERBOSE
options IPFIREWALL_VERBOSE_LIMIT=100

The first allows ipfw to do verbose logging of packets and events, and the second limits the number of log entries that a single, consecutive event can log.

ipfw support can also be loaded as a module, by simply running:

/sbin/kldload ipfw.ko

It is VERY IMPORTANT to understand that ipfw, when existent in a kernel, or when loaded as a kernel module with kldload, will _always_ have a single default, final rule. That is, there will always be at least one rule present, and that rule will be the final, catch-all rule for any packet that does not match any prior rules.

By default, the final, default rules is:

deny ip from any to any

Which means (VERY IMPORTANT) that if you enable the firewall in your kernel and then reboot, you will have no network connectivity to the machine in question _at all_. The machine will have a single rule, which is set to deny all ip traffic, and you will not be able to reach it in any way.

Similarly, if you simply run `/sbin/kldload ipfw.ko` on a normal FreeBSD system, you will be instantly locked out of the system, and the system will be completely unreachable from the network.

There are a few ways to solve this problem. First, you can add a fourth line to your kernel configuration file:

options IPFIREWALL_DEFAULT_TO_ACCEPT

This means that ipfw support will be added to the kernel, but with a final, default rule of:

allow ip from any to any

With this rule in place as the final, default rule, even if you add no other rules, when the system restarts you will be able to see it on the network just fine, and no packets will be blocked.

Another way to solve the problem is to leave ipfw in the kernel set to the default (which is to deny ip from any to any) but a rule to your startup configuration to open up some or all network services. The default will still be to deny ip from any to any, but any traffic the first matches your allow rules will be allowed in. This method will work fine, regardless of whether you added ipfw support in the kernel or by loading a module. In fact, you can (theoretically, although I distrust this method) run this command:

/sbin/kldload ipfw.ko ; ipfw add 65500 allow ip from any to any

and not be locked out ... the default rule is always number 65535, so by instantly adding a rule at 65500 to allow all traffic, you can load the ipfw module safely without locking yourself out. I do not recommend this, however, as I do not trust that this will work 100% of the time.

Finally, a third method to enable ipfw without locking yourself out is to rebuild the ipfw.ko module to set its default to allow, instead of deny - just like we were able to set a kernel configuration line to set the default to allow. Building this module with that custom configuration is beyond the scope of this document.

Why does FreeBSD have a default setting for ipfw to deny all traffic ? The reason is, you do not want to allow a malicious party to circumvent your firewall rules by crashing your machine. In some configurations, firewall rules may not be loaded at boot time, so if the default was "allow all from any to any" then one could circumvent the firewall rules by crashing the firewall - when it came back up the rules would not be loaded, and all traffic would pass.

REMEMBER - no matter what you do, there will always be a single rule in place - rule number 65535. You cannot delete this rule, and it is set to deny all or allow all depending on how it was set (as we discussed above).

You can view all of the rules currently active on a system by running:

/sbin/ipfw show

A useful way to see all the rules that might apply to a certain IP, for instance 10.10.10.10, is:

/sbin/ipfw show | grep "10.10.10.10"

Here is what the results of `ipfw show` would look like on a system with a default accept configuration:

# ipfw show
65535 109490 44385056 allow ip from any to any

As you can see, there is only one rule - number 65535. You can only configure 65535 rules, and that is always the last one, and is always present.

The second number in the output is the packet count - how many packets have passed through that rule, and the third number is the byte count - how many bytes have passed through that rule.

You add rules with commands like:

ipfw add 100 allow tcp from any to 10.10.10.10 22

That line allows all tcp traffic on port 22 destined for 10.10.10.10.

If you now run ipfw show, you would see:

# ipfw show
00100      0        0 allow tcp from any to 10.10.10.10 22
65535 116103 46464923 allow ip from any to any

You could then delete rule #100 with:

ipfw del 00100

A very useful and command setup for a system behind a firewall is to open up only the ports that correspond to services actually running on that system, and deny all other traffic. Here is an example, where the IP in question is 10.10.10.10:

ipfw add 100 allow tcp from any to 10.10.10.10 established
ipfw add 200 allow tcp from any to 10.10.10.10 22,25,80,443 setup
ipfw add 300 deny tcp from any to 10.10.10.10

So, in this example, we first allow any previously established tcp connections to this IP, then we allow any tcp connections that are in a setup (TCP 3-way handshake) mode - these two rules together account for all possible legitimate tcp traffic. Then the third rule simply denies all other tcp traffic.

As you can see, ipfw applies rules FIFO - so as a packet travels from rule 0 to rule 65535, as soon as it matches a rule, the packet is processed, and leaves the ruleset - no further rules are applied to that packet, as it has been passed.

In the above example, the user is running ssh, smtp, http, and https ... however, for a system to be workable on the network, a few other things should be open if they run a dns server - since dns uses UDP as well:

ipfw add 100 allow tcp from any to 10.10.10.10 established
ipfw add 200 allow tcp from any to 10.10.10.10 22,25,53,80,443 setup
ipfw add 300 allow udp from any to 10.10.10.10 53
ipfw add 400 deny ip from any to 10.10.10.10

Note the addition of rule 300, which allows udp to come in on port 53. Also notice that since dns uses both tcp and udp, we have added 53 to the allowed tcp list as well. Finally, note that we can refer to ipfw rule numbers as either 00200 or 200 - it is the same thing.

Also, note that the final rule for this IP, #400, does not simply deny all tcp or all udp (or both), rather, it denies all IP _period_. So if it doesn't match the tcp list, and if it is not udp53, then it gets dropped as soon as it hits rule 400.

It should be noted that there is performance penalty on the firewall machine for each rule added to it _that packets pass through_. (so, even if you have 1000 rules on the firewall, if the first rule is `allow ip from any to any` then there is no difference in performance than if the ruleset had only one rule - all packets get passed right away and none pass trough the other 999 rules)

If a poor ruleset design is in use, and the firewall takes in a lot of traffic (or passes a DoS attack) it can knock the firewall off the network. The machine will not be crashed - as soon as the traffic lets up it will respond again - but no communication will pass while the CPU is overloaded.

The number one consideration when tuning the ruleset for performance is to pass packets that should be passed as fast as possible, and dump packets that should be dumped as fast as possible. For example, we know that we don't want to ever pass certain types of packets (a tcp packet with all option fields set, a tcp packet with no MSS setting, and so on) so, the very first four rules on our firewall are:

00003   49913883 2009604713 deny tcp from any to any tcpflags syn tcpoptions !mss
00003   23958169 1342587681 deny icmp from any to any icmptypes 4,5,9,10,12,13,14,15,16,17,18
00003        142       8496 deny tcp from any to any tcpflags syn,fin
00003          0          0 deny tcp from any to any tcpflags fin,psh,rst,urg

This means that these packets, if we ever encounter them (and we frequently do in a DoS attack) are dropped immediately. If we had 1000 rules in our ruleset, and put these at the end, a big DoS attack would cripple the firewall, because _every single_ other rule would have to be processed against every single packet until it got spit out the end.

On the other side of the coin, we also allow what we know should be allowed _as fast as we can_.

Take the example above:

ipfw add 100 allow tcp from any to 10.10.10.10 established
ipfw add 200 allow tcp from any to 10.10.10.10 22,25,80,443 setup
ipfw add 300 deny tcp from any to 10.10.10.10

Now, lets say we have four machines on our network, so we set up a block of rules for all four:

ipfw add 100 allow tcp from any to 10.10.10.10 established
ipfw add 150 allow tcp from any to 10.10.10.10 22,25,80,443 setup
ipfw add 200 deny tcp from any to 10.10.10.10
ipfw add 250 allow tcp from any to 10.10.10.20 established
ipfw add 300 allow tcp from any to 10.10.10.20 22,25,80,443 setup
ipfw add 350 deny tcp from any to 10.10.10.20
ipfw add 400 allow tcp from any to 10.10.10.30 established
ipfw add 450 allow tcp from any to 10.10.10.30 22,25,80,443 setup
ipfw add 500 deny tcp from any to 10.10.10.30
ipfw add 550 allow tcp from any to 10.10.10.40 established
ipfw add 600 allow tcp from any to 10.10.10.40 22,25,80,443 setup
ipfw add 650 deny tcp from any to 10.10.10.40

Now, each ruleset applies to a particular IP. The problem is, we can tell by looking at this that we _always_ pass established tcp packets – no matter what IP they are for. If an established packet comes in for 10.10.10.40, it first has to pass through 9 other rules before it gets to:

ipfw add 550 allow tcp from any to 10.10.10.40 established

Since we know we always pass established packets, we can shorten this dramatically by doing this:

ipfw add 001 allow tcp from any to any established
ipfw add 150 allow tcp from any to 10.10.10.10 22,25,80,443 setup
ipfw add 200 deny tcp from any to 10.10.10.10
ipfw add 300 allow tcp from any to 10.10.10.20 22,25,80,443 setup
ipfw add 350 deny tcp from any to 10.10.10.20
ipfw add 450 allow tcp from any to 10.10.10.30 22,25,80,443 setup
ipfw add 500 deny tcp from any to 10.10.10.30
ipfw add 600 allow tcp from any to 10.10.10.40 22,25,80,443 setup
ipfw add 650 deny tcp from any to 10.10.10.40

Now an established tcp connection (that we always want to pass) is passed at rule #1 - as fast as it possible can be passed. If we have hundreds of systems behind this firewall, with rulesets such as this, this can cause the CPU of the firewall to be totally idle, vs. being 50% utilized – just by putting that change in. I have seen it. This is true because a large percentage of all traffic at all times is established tcp traffic.

Also, note that we always deny ip from any to (the IP) as soon as that IPs ruleset is over - this is not technically necessary - you could just let the final 65535 "deny all" rule catch it and deny it there - but that would kill performance because the packet would have to be matched against every other rule inbetween first.

If you forget to put in index #, it will add as 65535

Firewall Rule Configuration

The firewall startup script is found here:

/etc/firewall.sh

It is created periodically based on the current ruleset.

The only thing we do with ipfw on the firewall is block or accept packets and occasionally cap some ips (we do not do any counting, or accounting).

The first rule is to allow traffic pointed at the firewall itself to pass – this is to facilitate access in the event of a DoS attack.

00001 allow ip from any to 69.55.230.1

Rules 2-10 are for bandwidth capping and blocking bad people:

00002 pipe 2 ip from 69.55.224.109 to any xmit em0
00003 pipe 3 ip from { 69.55.227.54 or 69.55.227.55 } to any xmit em0
00004 pipe 4 ip from 69.55.238.194 to any xmit em0
00005 pipe 5 ip from 69.55.238.162 to any xmit em0
00006 deny ip from 69.22.167.138 to any

Rule 100 is for our infrastructure machines:

00100 allow udp from any 53 to 69.55.230.2
00100 allow udp from 69.55.230.2 123 to any
00100 allow udp from 69.55.230.2 to any dst-port 53
00100 allow tcp from any to 69.55.230.2 dst-port 22,25,80,443,110,123,1984,8080 setup
00100 allow icmp from any to 69.55.230.2 icmptypes 0,3,8 keep-state
00100 allow udp from 69.55.230.1 161 to 69.55.230.2
00100 deny ip from any to 69.55.230.2
00100 allow tcp from any to 65.55.238.150 dst-port 25 setup

Rules 101-150 are for jails/virts they disable all traffic from the pub net except from mail, backup, dns, and virtuozzo:

00101 deny ip from any to 69.55.238.120
00102 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.228.53
00102 deny ip from any to 69.55.228.53
00103 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.238.64
00103 deny ip from any to 69.55.238.64
00104 allow ip from { 69.55.230.2 or 69.55.230.9 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.238.92
00104 deny ip from any to 69.55.238.92
00106 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.238.180
00106 deny ip from any to 69.55.238.180
00107 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.238.210
00107 deny ip from any to 69.55.238.210
00109 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.237.129
00109 deny ip from any to 69.55.237.129
00110 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.236.128
00110 deny ip from any to 69.55.236.128
00111 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.236.92
00111 deny ip from any to 69.55.236.92
00112 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.235.200
00112 deny ip from any to 69.55.235.200
00113 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.225.2
00113 deny ip from any to 69.55.225.2
00114 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.226.128
00114 deny ip from any to 69.55.226.128
00115 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.224.32
00115 deny ip from any to 69.55.224.32
00116 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.224.110
00116 deny ip from any to 69.55.224.110
00117 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.228.2
00117 deny ip from any to 69.55.228.2
00130 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.227.2
00130 deny ip from any to 69.55.227.2
00132 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.237.220
00132 deny ip from any to 69.55.237.220
00133 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.236.192
00133 deny ip from any to 69.55.236.192
00134 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.236.64
00134 deny ip from any to 69.55.236.64
00135 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.235.170
00135 deny ip from any to 69.55.235.170
00136 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.234.151
00136 deny ip from any to 69.55.234.151
00137 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.225.77
00137 deny ip from any to 69.55.225.77
00138 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.226.2
00138 deny ip from any to 69.55.226.2
00139 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.226.161
00139 deny ip from any to 69.55.226.161
00140 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.224.150
00140 deny ip from any to 69.55.224.150
00141 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.227.70
00141 deny ip from any to 69.55.227.70
00141 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.229.2
00142 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.227.70
00142 deny ip from any to 69.55.227.70
00143 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.230.18
00143 deny ip from any to 69.55.230.18
00144 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.229.100
00144 deny ip from any to 69.55.229.100

In addition to rule 00010 (allow all established) and rule 65500 (allow all) we also have a few more special rules:

00012 deny tcp from any to any tcpflags syn tcpoptions !mss
00012 deny icmp from any to any icmptypes 4,5,9,10,12,13,14,15,16,17,18
00012 deny tcp from any to any tcpflags syn,fin
00012 deny tcp from any to any tcpflags fin,psh,rst,urg
00012 allow icmp from any to any
00013 allow udp from any to 69.55.225.225 dst-port 53
00014 deny tcp from any to any dst-port 135

These are the four DoS attack lines we have in place right at the beginning of the ruleset.

When the machine boots, and is running only three total rules, you then log in and run /etc/firewall.sh - that contains all the additional rules - running the script will put them all in place - then the firewall is fully configured.

Now, by default, we do not put any rules in place at all for a customer - they are left wide open. Most customers do not ever change this. However, if a customer requests a ruleset on our firewall, we implement it in the general form that was described above - allow all ports that need to be open, and deny all others.

The firewall rule numbers are not numbered arbitrarily - they are numbered by customer number. So customer 327 gets 03270 - 03279, and customer 589 gets 05890 - 05899 ... once we get to customer 1000, they will have 10000 - 10009.

This does not mean that every customer can only have 10 rules - as you can see from the four DoS attack rules that are all numbered 00003, you can create multiple rules at the same rule number. I don't advise it though.

Because customer requests are generally "allow these and block everything else" we actually have a script on the firewall to create a typical ruleset. The script is called "rulemaker", and it runs like this:

# rulemaker
usage:  rulemaker [cust#] IP [port1,port2,...,port10]

So, it has three command line options - the customer number (significant digits only), the IP, and a comma-delimited list of ports (with no spaces).

So, if customer 398 comes to you and says:

"please open up tcp ports for ssh, smtp and http and close all the rest"

then you would run:

rulemaker 398 10.10.10.10 22,25,80

And this is what would happen:

gateway# rulemaker 398 10.10.10.10 22,25,80
/sbin/ipfw add 03981 allow udp from 10.10.10.10 to any 53
/sbin/ipfw add 03982 allow udp from any 53 to 10.10.10.10
/sbin/ipfw add 03983 allow tcp from any to 10.10.10.10 22,25,80 setup
/sbin/ipfw add 03989 deny ip from any to 10.10.10.10

or, if they have a dns server:

/sbin/ipfw add 03981 allow udp from 10.10.10.10 to any 53
/sbin/ipfw add 03982 allow udp from any 53 to 10.10.10.10
/sbin/ipfw add 03983 allow tcp from any to 10.10.10.10 22,25,53,80 setup
/sbin/ipfw add 03984 allow udp from any to 10.10.10.10 53
/sbin/ipfw add 03989 deny ip from any to 10.10.10.10

REMEMBER TO ADD YOUR PASTE TO /usr/local/etc/ipfw.sh
gateway#

if they have dns, put a 53 in the command line arg to rulemaker

You are shown a list of rules to paste into place if they don't run a dns server, and one if they do.

Note that rulemaker does not actually put any rules in place at all, it just echos the commands you should run. So, since the customer did not specify port 53, we can assume they do not run a dns server, and we can simply paste this:

/sbin/ipfw add 03981 allow udp from 10.10.10.10 to any 53
/sbin/ipfw add 03982 allow udp from any 53 to 10.10.10.10
/sbin/ipfw add 03983 allow tcp from any to 10.10.10.10 22,25,80 setup
/sbin/ipfw add 03989 deny ip from any to 10.10.10.10

into the shell, and hit enter once or twice afterwards. Very simple. We then email the customer and tell them that the lines are in place, and to test them.

customer numbers larger than 999 will work fine with this script because:

ipfw add 010000 (rule)

and

ipfw add 10000 (rule)

translate into the same thing. So adding unnecessary zeroes does not hurt anything. (the rulemaker script outputs 0$1 as the rule number - so it always prepends a zero to make the three-digit customer numbers correct, and that zero prepended to a four digit customer number will not hurt anything - it will just be ignored)

Almost every rule in the firewall is part of a little 4 or 5-line set like rulemaker outputs. Some exceptions are when people want you to open up icmp for them as well (since the above rulemaker output denies it) in which case you would simply paste the rulemaker output, and then afterwards add another rule:

ipfw add 03984 allow icmp from any to 10.10.10.10

Remember, if they run a dns server, they need to have tcp port 53 in their port list and you need to paste the second block that rulemaker outputs.

Some customers, however, do not request a formal ruleset - they simply say to block off port 3306 from the outside (mysql) or they say to block all netbios ports (135,137,139) or something like that. If they do this, do not use rulemaker - simply add a rule just for that:

/sbin/ipfw add 05431 deny tcp from any to 10.10.10.10 3306

or

/sbin/ipfw add 05431 deny tcp from any to 10.10.10.10 135,137,139

On the other hand, a customer may request a normal ruleset, but then request that you only open ssh for a certain IP block or IP. Here is an example of a ruleset that was started with rulemaker, but then additional rules were added:

07471      47802    3991038 allow udp from 69.55.225.125 to any 53
07472      14490    1309166 allow udp from any 53 to 69.55.225.125
07473      85950    4252824 allow tcp from any to 69.55.225.125 22,25,53,80,443,110,143,220 setup
07474      45358    3378454 allow udp from any to 69.55.225.125 53
07475         84       5016 allow tcp from any to 69.55.225.127 22,443
07475         94       5472 allow tcp from any to 69.55.225.128 22,443
07476      38805    3552124 allow icmp from any to 69.55.225.127
07476      38524    3536996 allow icmp from any to 69.55.225.128
07478          6        288 allow tcp from 66.166.221.232/29 to 69.55.225.125 3309
07478        286      13728 allow tcp from 66.166.221.232/29 to 69.55.225.125 3306
07479     109767    6222136 deny ip from any to { 69.55.225.125 or dst-ip 69.55.225.127 or dst-ip 69.55.225.128 }

So ... 69.55.225.125 is the main IP, and what was used in rulemaker, and the main allow line is very familiar:

07473      85950    4252824 allow tcp from any to 69.55.225.125 22,25,53,80,443,110,143,220 setup

but then they wanted allow only 22 and 443 to the other two IP addresses:

07475         84       5016 allow tcp from any to 69.55.225.127 22,443
07475         94       5472 allow tcp from any to 69.55.225.128 22,443

(note they share an ipfw rule number)

then icmp should also be allowed to the other two IPs:

07476      38805    3552124 allow icmp from any to 69.55.225.127
07476      38524    3536996 allow icmp from any to 69.55.225.128

then there are two addresses out in the world that should be totally unfettered in their ability to talk to the main IP:

07478          6        288 allow tcp from 66.166.221.232/29 to 69.55.225.125

or to two ports

07478        286      13728 allow tcp from 66.166.221.232/29 to 69.55.225.125 3306
07478        286      13728 allow tcp from 66.166.221.232/29 to 69.55.225.125 3309

(note, again, sharing ipfw numbers, and also specifying a netblock instead of a single IP: 66.166.221.232/29)

then finally, the last rule that rulemaker outputs was thrown out and this was used instead:

07479     109767    6222136 deny ip from any to { 69.55.225.125 or dst-ip 69.55.225.127 or dst-ip 69.55.225.128 }

Since we are dealing with three IPs total.

Some more example requests:

Replacing a rule (customer wants port 21 access):

gateway# g 69.55.225.3
07161      22462    1795170 allow udp from 69.55.225.3 to any dst-port 53
07162      21220    3283214 allow udp from any 53 to 69.55.225.3
07163      52962    2989600 allow tcp from any to 69.55.225.3 dst-port 22,80,443,25,110,995,143,993,53 setup
07164      20234    1314826 allow udp from any to 69.55.225.3 dst-port 53
07169      30715    2409544 deny ip from any to 69.55.225.3
gateway#
gateway# ipfw del 07163 ; ipfw add 07163 allow tcp from any to 69.55.225.3 20,21,22,80,443,25,110,995,143,993,53 setup
07163 allow tcp from any to 69.55.225.3 20,21,22,80,443,25,110,995,143,993,53 setup
gateway#

Please block all traffic from this range of IPs: Inet num: 195.238.48.0 - 195.238.63.255

gateway# g 69.55.226.144
08441        356      21668 allow udp from 69.55.226.144 to any dst-port 53
08442       6744    1114132 allow udp from any 53 to 69.55.226.144
08443       7358     411368 allow tcp from any to 69.55.226.144 dst-port 22,25,80,110,443 setup
08449       3135     280030 deny ip from any to 69.55.226.144
gateway#
gateway# ipfw add 08440 deny ip from 195.238.48.0/20 to 69.55.226.144

in reply, say “your ruleset is now…”

/etc/firewall.sh is backed up daily locally (/etc/oldrules) and to the backup server

We add rules to block traffic from directly contacting our jails/virts. Each rule is basically the same except for the id (which reflects the machine) and the machine’s IP Here’s some examples:

Jail2:

00102 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 69.55.238.150 } to 69.55.238.2
00102 deny ip from any to 69.55.238.2

Quar1:

00130 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.227.2
00130 deny ip from any to 69.55.227.2

Virt12:

00142 allow ip from { 69.55.230.2 or 69.55.230.10 or 69.55.225.225 or 80.89.140.126 or 12.109.148.175 or 69.64.46.27 or 194.67.59.14 or 69.55.238.150 } to 69.55.229.2
00142 deny ip from any to 69.55.229.2

The IPs listed for access are mail (the new mail), backup2, ns1c, and virtuozzo

To dump/watch traffic:

tcpdump –vvv –n –i em1

Setting up bandwidth caps

Creating a new pipe to limit someone's outbound speed: First make sure that you're not about to use a pipe that already exists.

newgateway# ipfw pipe list
00001:   1.000 Mbit/s    0 ms   50 sl. 1 queues (1 buckets) droptail
    mask: 0x00 0x00000000/0x0000 -> 0x00000000/0x0000
BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
  0 tcp    69.55.224.109/44027    67.28.113.10/25      22     1320  0    0   0
newgateway#

there's already a pipe 1, so we'll use pipe 2, we're also going to add this as rule 2. (in this case the customer's IP is 69.55.224.109, and we only want to catch stuff going out so we use xmit em0.

newgateway# ipfw add 2 pipe 2 ip from 69.55.224.109 to any xmit em0
00002 pipe 2 ip from 69.55.224.109 to any xmit em0
newgateway#

Now all we have to do is set the speed limit:

newgateway# ipfw pipe 2 config bw 1Mbit/s
newgateway#

Lastly, list the pipes to make sure everything is the way we want it:

newgateway# ipfw pipe list
00001:   1.000 Mbit/s    0 ms   50 sl. 1 queues (1 buckets) droptail
    mask: 0x00 0x00000000/0x0000 -> 0x00000000/0x0000
BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
  0 tcp    69.55.224.109/44027    67.28.113.10/25     747    44980  0    0   0
00002:   1.000 Mbit/s    0 ms   50 sl. 1 queues (1 buckets) droptail
    mask: 0x00 0x00000000/0x0000 -> 0x00000000/0x0000
BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
  0 tcp    69.55.224.109/80      62.172.72.131/26327 8468  9259344 40 42972 1038
newgateway#

Removing a pipe: the rule to match on and the pipe itself have to be deleted separately:

newgateway# ipfw delete 1
newgateway#

and to delete the pipe itself:

newgateway# ipfw pipe delete 1
newgateway#

list the pipes again:

newgateway# ipfw pipe show
00002:   1.000 Mbit/s    0 ms   50 sl. 1 queues (1 buckets) droptail
    mask: 0x00 0x00000000/0x0000 -> 0x00000000/0x0000
BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
  0 tcp    69.55.224.109/80      62.172.72.131/26327 38383 42636953 48 50955 5111
newgateway#

more than one rule can feed into a pipe, so the speed of everything that matches will get lumped together in the same pipe. this is useful when a customer has more than on IP or system, and you want to limit his total combined speed.

ipfw on jail machines

The jail machines also have ipfw loaded, however all of the jail machines have ipfw loaded to default-accept. This is because ipfw on the jail machines is not used for firewalling - it is used for traffic counting. Note- ipfw will not run inside a jail- a jail vps customer cannot have its own self-managed ipfw rules. You see, in addition to deny and allow rules, you can also do things like:

ipfw add 00001 count ip from 10.10.10.10 to any
ipfw add 00002 count ip from any to 10.10.10.10

which counts traffic bound from that IP, and bound to it. Where does it count it ? Simply in the ipfw rule itself. Here is a sample from one of the jail systems:

# ipfw show
00201        10          815 count ip from 198.78.70.176 to any
00202        10          802 count ip from any to 198.78.70.176
01631         1           62 count ip from 69.55.238.225 to any
01632         1          481 count ip from any to 69.55.238.225
01801        72        70154 count ip from 69.55.238.214 to any
01802        82         9047 count ip from any to 69.55.238.214
01811         3          245 count ip from 69.55.238.215 to any
01812         2          167 count ip from any to 69.55.238.215
01821         5          656 count ip from 198.78.66.216 to any
01822         4          377 count ip from any to 198.78.66.216
01841         0            0 count ip from 69.55.238.218 to any
01842         0            0 count ip from any to 69.55.238.218
01851         0            0 count ip from 198.78.66.219 to any
01852         0            0 count ip from any to 198.78.66.219
01861         3          218 count ip from 198.78.66.220 to any
01862         3          263 count ip from any to 198.78.66.220
01921         0            0 count ip from 69.55.238.224 to any
01922         0            0 count ip from any to 69.55.238.224
02241         0            0 count ip from 69.55.238.118 to any
02242         0            0 count ip from any to 69.55.238.118
02261         0            0 count ip from 69.55.238.223 to any
02262         0            0 count ip from any to 69.55.238.223
02271         0            0 count ip from 69.55.237.9 to any
02272         0            0 count ip from any to 69.55.237.9
02291        46        37023 count ip from 69.55.237.8 to any
02292        50        10939 count ip from any to 69.55.237.8
02311        20         1974 count ip from 69.55.237.7 to any
02312        22         1540 count ip from any to 69.55.237.7
03351         0            0 count ip from 69.55.237.163 to any
03352         0            0 count ip from any to 69.55.237.163
65535 102592563 113861945636 allow ip from any to any

Note two things - first, the packet counts and byte counts are very low, since this was taken shortly after system boot. Second, notice that the last line is "allow ip from any to any". That last line is the only line that affects actual traffic in any way - the others are just count rules.

Also, note that the rules are done by customer number - the customer number plus either a 1 or a 2 at the end - since every customer needs two total rules to count both inbound and outbound traffic. For instance:

01811         3          245 count ip from 69.55.238.215 to any
01812         2          167 count ip from any to 69.55.238.215

those are the two rules for customer 181 (col00181).

Remember how the `jailmake` utility asks for an "ipfw#" ? The three or four digit representation of the customer number is what it is asking for. As you can see, the last section of jailmake contains these lines:

/sbin/ipfw add `echo 0"$7"1` count ip from any to $ip
/sbin/ipfw add `echo 0"$7"2` count ip from $ip to any

since the IP and the ipfw# are specified on the jailmake command line, this is very easy to do. Note again that the rule number is prepended with a zero - which does not hurt anything if it is an extra zero in the case of a four digit customer number.

The jails do not add each individual ipfw line at boot time like the firewall does. They are added by the postboot script which should only be run once to avoid duplicate ipfw entries.

The only odd thing about ipfw on the jails is that it is loaded as a module on jails 1-10, and loaded in the kernel in all jails 11 and beyond.

This is because doing traffic counting on the jails was not thought of until after we had loaded jail10. So, rather than schedule a maintenance and reboot all 10 jail systems, we simply built a default-allow module, placed it in the / directory, and loaded that at boot time. Therefore, on all jails 1-10, you will not only see something like this in the / directory:

ipfw.4.7.accept.ko

but you will also see a line like this in /usr/local/etc/rc.d/boot.sh:

/sbin/kldload /ipfw.4.7.accept.ko

(the module is named for the version of freebsd it was built for)

Jails 11-15 (and any future ones) have these lines:

options IPFIREWALL
options IPFIREWALL_VERBOSE
options IPFIREWALL_VERBOSE_LIMIT=100
options IPFIREWALL_DEFAULT_TO_ACCEPT

in the kernel configuration file - note the default-accept line.

The traffic counting on the freebsd machines is not just for our benefit -to see by running `ipfw show` ... there is also a cron job on every freebsd system:

4,9,14,19,24,29,34,39,44,49,55,59 * * * * /usr/local/jail/bin/trafstats

that matches up the rules with the directories, and every five minutes overwrites the users' /jc_traffic_dump with the latest traffic stats.

Handling a DoS attack

When any attack occurs, usually Castle will catch it, stop it by null routing the IP. Sometimes our internal doswatch script will catch a UDP flood (DoS) so you’ll be aware at about the same time Castle is that there’s a DoS attack in progress. Other times it will look like the entire network is down or probes are flipping up and down.

If an attack is ongoing and castle has not null routed the IP, you need to try to figure out the IP that's sending/receiving the attack. In the case of the former, it's usually easy to find the VPS/server (once the IP is identified) and cut them off by shutting off their port, dropping their traffic at the firewall, stopping the dos'ing proc on the VPS, or stopping the server.

When the attack is coming in, our attacked IP needs to be null routed to stop the attacker(s). However, when castle takes their packet capture they will almost always get a false positive when looking for the top talker. This is because the top talkers are rsync.net (69.43.165.0/27) and col01372 (69.55.234.230, 69.55.234.246). So any search should exclude those IPs and ranges. Any other IP they find should be closely inspected to make sure it's not one of large customers, like col01372.

Sometimes you’ll have no idea an attack has happened until you get an email from Castle that there was a DoS attack. When you do get notification that there was a DoS attack, and if a null route has been placed, it’s important to contact the customer ASAP to 1) Notify him of what’s happened and, 2) Give him a new IP

Tasks:

Create an entry in the Mgmt -> Reference -> DoSLog, using info fed to us by Castle (add the duration of the attack to the time of the attack to determine the time the attack ended). If he’s receiving a new IP (see below), add a note about what IP he was moved to: “moved to 69.55.239.XX”
Any customer who’s the target of an attack will lose his attacked IP and be given a new IP from the “bad boy” block – 69.55.239.0/24. Any customer who’s machine was used to DoS attack someone else does not need to receive a new IP.

Incoming DOS:

Send an “incoming dos” email to the customer to explain what happened. Take care to cc his alt contacts esp if his email looks like it’s hosted on our server. Optional- you may want to preemptively update his DNS so that any domains pointed at the old IP will point at the new badboy IP. The ipswap script is useful for this purpose if there are many domains.
Switch out the attacked IP for the bad boy IP: for VPS’s, remove the old IP and add the new one. For dedicateds, tell them they will need to assign the new IP and remove the old one from their server. Offer an IPKVM if they’re nervous. Remind them that they may need to change the gateway IP if the attacked IP was the primary IP on their server.
Once the attacked IP has been removed, respond to the DoS email from Castle telling them it’s ok to remove the null and what bad boy IP they were moved to (Castle needs confirmation on the move). Save the email from castle to the “dos” folder.

Outgoing DOS:

Do a ps and top on the customer’s VPS to see what process is doing the DoS attack. It’s almost always obvious and usually includes an IP address as part of the process name. Stop his VPS. Send an “outgoing dos” email to the customer to explain what happened, showing them their ps, telling them we took them offline till we could (in the case of a jail) coordinate a time with them to bring it back online when we know they’ll be waiting to login and take action to patch/fix (in the case of a linux VPS remind them they can fire it up via control panel- give link…unless the hacker has root access in which case assume he can reach the cpl as well and disable it or block all traffic to all ports). It’s good in these cases to offer to block off all ssh access (in our firewall) except from an IP they give us until the compromise has been identified and removed, ESPECIALLY if the dos process is a real user (not apache, or some other service) cause the hacker could log back in the moment the server comes back up. Take care to cc his alt contacts esp if his email looks like it’s hosted on our server.
Save the email to the “dos” folder.
Work with the customer to definitively identify how the hacker got in in the first place (and to prevent a future occurrence). Possibly offer the customer a new VPS to move into since the old one is “tainted”. Encourage them to look at passwords and patch all software, esp web-based.

Notes on resizing gconcats (with growfs)

To figure out the new size of the a partition, subtract 16 from the c partition:
20G: 41943030 - 16 = 41943014
18G: 37748727 - 16 = 37748711
16G: 33554424 - 16 = 33554408
14G: 29360121 - 16 = 29360105
12G: 25165818 - 16 = 25165802
10G: 20971515 - 16 = 20971499
8G: 16777212 - 16 = 16777196
6G: 12582909 - 16 = 12582893
4G: 8388606 - 16 = 8388590
C partition: 20G: 4194304 * 10 - 10 = 41943030
18G: 4194304 * 9 - 9 = 37748727
16G: 4194304 * 8 - 8 = 33554424
14G: 4194304 * 7 - 7 = 29360121
12G: 4194304 * 6 - 6 = 25165818
10G: 4194304 * 5 - 5 = 20971515
8G: 4194304 * 4 - 4 = 16777212
6G: 4194304 * 3 - 3 = 12582909
4G: 4194304 * 2 - 2 = 8388606
or for 1G volumes: 2G: 4194302 - 16 = 4194286 2G: 2097152 * 2 - 2 = 4194302

FreeBSD Reference: Difference between revisions

Revision as of 17:15, 28 February 2013

Contents

jail in FreeBSD

/etc/rc.conf

Ports

Manually adding and subtracting IPs from a machine

MD/VN Devices and Mounting Customer Filesystems

Dump and Restore

ipfw (firewalling)

Firewall Rule Configuration

Setting up bandwidth caps

ipfw on jail machines

Handling a DoS attack

Notes on resizing gconcats (with growfs)

Navigation menu

@@ Line 1: / Line 1: @@
+= jail in FreeBSD =
+Every FreeBSD vps is a jail.  A jail is an artificial set of attributes added to a set of processes that bind them together and separate them from other processes on the system.
+Do not be confused - there is no virtualization or virtual machine going on here at all.  As far as the base FreeBSD system is concerned, it is simply running a whole bunch of processes.  There is almost zero overhead in creating a jail around a set of processes.  That is to say, if there are 10 jails that each have 10 httpd processes in them, the performance will be almost exactly the same as if there were just a single FreeBSD system running 100 httpds.
+You can, however, tell from the output of <tt>ps auxw</tt> which processes are in a jail and which processes are not.  All processes that are inside of a jail have a 'J' in the STAT column of <tt>ps auxw</tt>.  Now, on a production jail server, the underlying system is only running about 20 processes - things like sshd, crond, and syslog.  So on a fully loaded jail system with 900 or more processes, only 20 or so would not have a 'J' in the STAT column.
+The reason it is nice to know which processes belong to the underlying server, which you can see by running:
+ ps auxwJ
+(we’ve patched ps on some older 4.x servers)
+or
+ ps auxw | grep -v J
+Is that you can HUP your own sshd or restart cron on the base system - since you know that if it doesn't have a J, it is the process that belongs to the base system.  (there may be 30 more syslog processes on the system as a whole, so if it weren't for this, it would be hard to differentiate yours from all the others).
+However, the J only tells you that the process is in a jail - not which jail it is in.  To tell what jail a process belongs to, you need to find its PID (in top or <tt>ps auxw</tt>) and then run:
+ jpid <pid>
+or
+ cat /proc/<pid>/status
+Here is an example:
+<pre>jail1# jpid 4137
+java 4137 1 3959 0 5,9 noflags 1103567686,299476 14192,584098 77838,429671 nochan 2530 2530 10005,10
+,10005 www.transelemnt.net</pre>
+(If you need to fnd a proc, <tt>ps wp <pid></tt> will find the path to the executable)
+As you can see, the last field in that single line of output is <tt>www.transelement.net</tt> –  so that is what system that process belongs to.
+You could then:
+ g www.transelement.net
+or
+ grep “www.transelement.net” /usr/local/jail/rc.d/?????
+and you would get:
+<pre>/usr/local/jail/rc.d/quad1:jail /mnt/data1/69.55.239.59-col00145-DIR www.transelement.net 69.55.239.59 /bin/sh /etc/rc
+/usr/local/jail/rc.d/safe1:jail /mnt/data1/69.55.239.59-col00145-DIR www.transelement.net 69.55.239.59 /bin/sh /etc/rc</pre>
+and you would see the jail command line from both quad1 and safe1 – you would then know the customer number as well, which is col00145.
+So, let's take a look at how a system is started.  The jail command line consists of the `jail` command and _four_ arguments:
+ jail (target_directory) hostname IP (command)
+So, in the case of www.transelement.net, we see that the target directory is:
+ /mnt/data1/69.55.239.59-col00145-DIR
+the hostname is:  <tt>www.transelement.net</tt>, the IP is <tt>69.55.239.59</tt>
+and the command is:  <tt>/bin/sh /etc/rc</tt>
+Now, that may look like two commands, but it is not - we are interpreting the shell script <tt>/etc/rc</tt> with <tt>/bin/sh</tt> - much in the same way that you might run:
+ perl script.pl
+It is important to note that when you see the command:
+ /bin/sh /etc/rc
+that the <tt>/bin/sh</tt> and the <tt>/etc/rc</tt> are both inside the target system – so the jail command will fail, and the system will not start if that person does not have <tt>/bin/sh</tt> or <tt>/etc/rc</tt>.  The actual <tt>/bin/sh</tt> and <tt>/etc/rc</tt> on the underlying system are of no use.
 = /etc/rc.conf =

FreeBSD Reference: Difference between revisions

Revision as of 17:15, 28 February 2013

jail in FreeBSD

/etc/rc.conf

Ports

Manually adding and subtracting IPs from a machine

MD/VN Devices and Mounting Customer Filesystems

Dump and Restore

ipfw (firewalling)

Firewall Rule Configuration

Setting up bandwidth caps

ipfw on jail machines

Handling a DoS attack

Notes on resizing gconcats (with growfs)

Navigation menu

Search