VPS Management

From JCWiki
Jump to navigation Jump to search

FreeBSD VPS

Starting jails: Quad/Safe Files

FreeBSD customer systems do not start up automatically at boot time. When one of our freebsd machines boots up, it boots up, and does nothing else. To start jails, we put the commands to start each jail into a shell script(s) and run the script(s). Jail startup is something that needs to be actively monitored, which is why we don’t just run the script automatically. More on monitoring later.

NOTE: >=7.x we have moved to 1 quad file: quad1. Startups are not done by running each quad, but rather startalljails which relies on the contents of quad1. The specifics of this are lower in this article. What follows here applies for pre 7.x systems.

There are eight files in /usr/local/jail/rc.d:

jail3# ls /usr/local/jail/rc.d/
quad1   quad2   quad3   quad4   safe1   safe2   safe3   safe4
jail3#

four quad files and four safe files.

Each file contains an even number of system startup blocks (total number of jails divided by 4)

The reason for this is, if we make one large script to startup all the systems at boot time, it will take too long - the first system in the script will start up right after system boot, which is great, but the last system may not start for another 20 minutes.

Since there is no way to parralelize this during the startup procedure, we simply open four terminals (in screen window 9) and run each script, one in each terminal. This way they all run simultaneously, and the very last system in each startup script gets started in 1/4th the time it would if there was one large file

The files are generally organized so that quad/safe 1&2 have only jails from disk 1, and quad/safe 3&4 have jails from disk 2. This helps ensure that only 2 fscks on any disk are going on at once. Further, they are balanced so that all quad/safe’s finish executing around the same time. We do this by making sure each quad/safe has a similar number of jails and represents a similar number of inodes (see js).

The other, very important reason we do it this way, and this is the reason there are quad files and safe files, is that in the event of a system crash, every single vn-backed filesystem that was mounted at the time of system crash needs to be fsck'd. However, fsck'ing takes time, so if we shut the system down gracefully, we don't want to fsck.

Therefore, we have two sets of scripts - the four quad scripts are identical to the four safe scripts except for the fact that the quad scripts contain fsck commands for each filesystem.

So, if you shut a system down gracefully, start four terminals and run safe1 in window one, and safe2 in window 2, and so on.

If you crash, start four terminals (or go to screen window 9) and run quad1 in window one, and quad2 in window 2, and so on.

Here is a snip of (a 4.x version) quad2 from jail17:

vnconfig /dev/vn16 /mnt/data2/69.55.228.7-col00820
fsck -y /dev/vn16
mount /dev/vn16c /mnt/data2/69.55.228.7-col00820-DIR
chmod 0666 /mnt/data2/69.55.228.7-col00820-DIR/dev/null
jail /mnt/data2/69.55.228.7-col00820-DIR mail1.phimail.com 69.55.228.7 /bin/sh /etc/rc

# moved to data2 col00368
#vnconfig /dev/vn28 /mnt/data2/69.55.236.132-col00368
#fsck -y /dev/vn28
#mount /dev/vn28c /mnt/data2/69.55.236.132-col00368-DIR
#chmod 0666 /mnt/data2/69.55.236.132-col00368-DIR/dev/null
#jail /mnt/data2/69.55.236.132-col00368-DIR limehouse.org 69.55.236.132 /bin/sh /etc/rc

echo ‘### NOTE ### ^C @ Local package initialization: pgsqlmesg: /dev/ttyp1: Operation not permitted’
vnconfig /dev/vn22 /mnt/data2/69.55.228.13-col01063
fsck -y /dev/vn22
mount /dev/vn22c /mnt/data2/69.55.228.13-col01063-DIR
chmod 0666 /mnt/data2/69.55.228.13-col01063-DIR/dev/null
jail /mnt/data2/69.55.228.13-col01063-DIR www.widestream.com.au 69.55.228.13 /bin/sh /etc/rc

# cancelled col00106
#vnconfig /dev/vn15 /mnt/data2/69.55.238.5-col00106
#fsck -y /dev/vn15
#mount /dev/vn15c /mnt/data2/69.55.238.5-col00106-DIR
#chmod 0666 /mnt/data2/69.55.238.5-col00106-DIR/dev/null
#jail /mnt/data2/69.55.238.5-col00106-DIR mail.azebu.net 69.55.238.5 /bin/sh /etc/rc

As you can see, two of the systems specified are commented out - presumably those customers cancelled, or were moved to new servers.

As you can see, the vnconfig line is the simpler command line, not the longer one that was used when it was first configured. As you can see, all that is done is, vnconfig the filesystem, then fsck it, then mount it. The fourth command is the `jail` command used to start the system – but that will be covered later.

Here is the safe2 file from jail17:

vnconfig /dev/vn16 /mnt/data2/69.55.228.7-col00820
mount /dev/vn16c /mnt/data2/69.55.228.7-col00820-DIR
chmod 0666 /mnt/data2/69.55.228.7-col00820-DIR/dev/null
jail /mnt/data2/69.55.228.7-col00820-DIR mail1.phimail.com 69.55.228.7 /bin/sh /etc/rc

# moved to data2 col00368
#vnconfig /dev/vn28 /mnt/data2/69.55.236.132-col00368
#mount /dev/vn28c /mnt/data2/69.55.236.132-col00368-DIR
#chmod 0666 /mnt/data2/69.55.236.132-col00368-DIR/dev/null
#jail /mnt/data2/69.55.236.132-col00368-DIR limehouse.org 69.55.236.132 /bin/sh /etc/rc

echo ‘### NOTE ### ^C @ Local package initialization: pgsqlmesg: /dev/ttyp1: Operation not permitted’
vnconfig /dev/vn22 /mnt/data2/69.55.228.13-col01063
mount /dev/vn22c /mnt/data2/69.55.228.13-col01063-DIR
chmod 0666 /mnt/data2/69.55.228.13-col01063-DIR/dev/null
jail /mnt/data2/69.55.228.13-col01063-DIR www.widestream.com.au 69.55.228.13 /bin/sh /etc/rc

# cancelled col00106
#vnconfig /dev/vn15 /mnt/data2/69.55.238.5-col00106
#mount /dev/vn15c /mnt/data2/69.55.238.5-col00106-DIR
#chmod 0666 /mnt/data2/69.55.238.5-col00106-DIR/dev/null
#jail /mnt/data2/69.55.238.5-col00106-DIR mail.azebu.net 69.55.238.5 /bin/sh /etc/rc

As you can see, it is exactly the same, but it does not have the fsck lines.

Take a look at the last entry - note that the file is named:

/mnt/data2/69.55.238.5-col00106

and the mount point is named:

/mnt/data2/69.55.238.5-col00106-DIR

This is the general format on all the FreeBSD systems. The file is always named:

IP-custnumber

and the directory is named:

IP-custnumber-DIR

If you run safe when you need a fsck, the mount will fail and jail will fail:

# mount /dev/vn1c /mnt/data2/jails/65.248.2.131-ns1.kozubik.com-DIR
mount: /dev/vn1c: Operation not permitted

No reboot needed, just run the quad script

Starting with 6.x jails, we added block delimiters to the quad/safe files, the block looks like:

echo '## begin ##: nuie.solaris.mu'
fsck -y /dev/concat/v30v31a
mount /dev/concat/v30v31a /mnt/data1/69.55.228.218-col01441-DIR
mount_devfs devfs /mnt/data1/69.55.228.218-col01441-DIR/dev
devfs -m /mnt/data1/69.55.228.218-col01441-DIR/dev rule -s 3 applyset
jail /mnt/data1/69.55.228.218-col01441-DIR nuie.solaris.mu 69.55.228.218 /bin/sh /etc/rc
echo '## end ##: nuie.solaris.mu'

These are more than just informative when running quad/safe’s, the echo lines MUST be present for certain tools to work properly. So it’s important that any updates to the hostname also be updated on the 2 echo lines. For example, if you try to startjail a jail with a hostname which is on the jail line but not the echo lines, the command will return with host not found.

FreeBSD 7.x+ notes

Starting with the release of FreeBSD 7.x, we are doing jail startups in a slightly different way. First, thereis only 1 file: /usr/local/jail/rc.d/quad1 There are no other quads or corresponding safe files. The reason for this is twofold, 1. We can pass –C to fsck which will tell is to skip the fsck if the fs is clean (no more need for safe files), 2. We have a new startup script which can be launched multiple times, running in parallel to start jails, where quad1 is the master jail file. Quad1 could still be run as a shell script, but it would take a very long time for it to run completely so it’s not advisable; or you should break it down into smaller chunks (like quad1, quad2, quad3, etc)

Here is a snip of (a 7.x version) quad1 from jail2:

echo '## begin ##: projects.tw.com'
mdconfig -a -t vnode -f /mnt/data1/69.55.230.46-col01213 -u 50
fsck -Cy /dev/md50c
mount /dev/md50c /mnt/data1/69.55.230.46-col01213-DIR
mount -t devfs devfs /mnt/data1/69.55.230.46-col01213-DIR/dev
devfs -m /mnt/data1/69.55.230.46-col01213-DIR/dev rule -s 3 applyset
jail /mnt/data1/69.55.230.46-col01213-DIR projects.tw.com 69.55.230.46 /bin/sh /etc/rc
echo '## end ##: projects.tw.com'

Cancelled jails are no longer commented out and stored in quad1, rather they’re moved to /usr/local/jail/rc.d/deprecated

To start these jails, start the 4 ssh sessions as you would for a normal crash and then instead of running quad1-4, instead run startalljails in each window. IMPORTANT- before running startalljails you should make sure you ran preboot once as it will clear out all the lockfiles and enable startalljails to work properly.

Problems with the quad/safe files

When you run the quad/safe files, there are two problems that can occur - either a particular system will hang during initialization, OR a system will spit out output to the screen, impeding your ability to do anything. Or both.

First off, when you start a jail, you see output like this:

Skipping disk checks ...
adjkerntz[25285]: sysctl(put_wallclock): Operation not permitted
Doing initial network setup:.
ifconfig: ioctl (SIOCDIFADDR): permission denied
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
Additional routing options: TCP keepalive=YESsysctl:
net.inet.tcp.always_keepalive: Operation not permitted.
Routing daemons:.
Additional daemons: syslogd.
Doing additional network setup:.
Starting final network daemons:.
ELF ldconfig path: /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting standard daemons: inetd cron sshd sendmail sendmail-clientmqueue.
Initial rc.i386 initialization:.
Configuring syscons: blanktime.
Additional ABI support:.
Local package initialization:.
Additional TCP options:.

Now, let's look at this line, near the end:

Local package initialization:.

This is where a list of daemons that are set to start at boot time willshow up. You might see something like:

Local package initialization: mysqld apache sendmail sendmail-clientmqueue

Or something like this:

Local package initialization: postgres postfix apache

The problem is that many systems (about 4-5 per machine) will hang on that line. Basically it will get to some of the way through the total daemons to be started:

Local package initialization: mysqld apache

and will just sit there. Forever.

Fortunately, pressing ctrl-c will break out of it. Not only will it break out of it, but it will also continue on that same line and start the other daemons:

Local package initialization: mysqld apache ^c sendmail-clientmqueue

and then continue on to finish the startup, and then move to the next system to be started.

So what does this mean? It means that if a machine crashes, and you start four screen-windows to run four quads or four safes, you need to periodically cycle between them and see if any systems are stuck at that point, causing their quad/safe file to hang. A good rule of thumb is, if you see a system at that point in the startup, give it another 100 seconds - if it is still at the exact same spot, hit ctrl-c. Its also a good idea to go back into the quad file (just before the first command in the jail startup block) and note that this jail tends to need a control-c or more time as follows:

echo '### NOTE ### slow sendmail'
echo '### NOTE ###: ^C @ Starting sendmail.'

NEVER hit ctrl-c repeatedly if you don't get an immediate response - that will cause the following jail’s startup commands to be aborted.

A second problem that can occur is that a jail - maybe the first one in that particular quad/safe, maybe the last one, or maybe one in the middle, will start spitting out status or error messages from one of its init scripts. This is not a problem - basically, hit enter a few times and see if you get a prompt - if you do get a prompt, that means that the quad/safe script has already completed. Therefore it is safe to log out (and log out of the user that you su'd from) and then log back in (if necessary).

The tricky thing is, if a system in the middle starts flooding with messages, and you hit enter a few times and don't get a prompt. Are you not getting a prompt because some subsequent system is hanging at the initialization, as we discussede above ? Or are you not getting a prompt because that quad file is currently running an fsck ? Usually you can tell by scrolling back in screen’s history to see what it was doing before you started getting the messages.

If you don’t get clues from history, you have to use your judgement - instead of giving it 100 seconds to respond, perhaps give it 2-3 mins ... if you still get no response (no prompt) when you hit enter, hit ctrl-c. However, be aware that you might still be hitting ctrl-c in the middle of an fsck. This means you will get an error like "filesystem still marked dirty" and then the vnconfig for it will fail and so will the jail command, and the next system in the quad file will then start starting up.

If this happens, just wait until the end of all the quad files have finished, and start that system manually.

If things really get weird, like a screen flooded with errors, and you can't get a prompt, and ctrl-c does nothing, then you need to just eventually (give it ten mins or so) just kill that window with ctrl-p, then k, and then log in again and manually check which systems are now running and which aren't, and manually start up any that are not.

Don't EVER risk running a particular quad/safe file a second time. If the quad/safe script gets executed twice, reboot the machine immediately.

So, for all the above reasons, anytime a machine crashes and you run all the quads or all the safes, always check every jail afterwards to make sure it is running - even if you have no hangs or complications at all. Run this command:

jailpsall

Note: postboot also populates ipfw counts, so it should not be run multiple times, use jailpsall for subsequent extensive ps’ing

And make sure they all show as running. If one does not show as running, check its /etc/rc.conf file first to see if maybe it is using a different hostname first before starting it manually.

One thing we have implemented to alleviate these startup hangs and noisy jails, is to put jail start blocks that are slow or hangy at the bottom of the safe/quad file. Further, for each bad jail we note in each quad/safe just before the start block something like:

echo ‘### NOTE ### ^C @ Local package initialization: pgsqlmesg: /dev/ttyp1: Operation not permitted’

That way we’ll be prepared to ^C when we see that message appear during the quad/safe startup process. If you observe a new, undocumented hang, after the quad/safe has finished, place a line similar to the above in the quad file, move the jail start block to the end of the file, then run buildsafe


Recovering from a crash

Diagnose whether you have a crash

The most important thing is to get the machine and all jails back up as soon as possible. Note the time, you’ll need to create a crash log entry (Mgmt. -> Reference -> CrashLog). The first thing to do is head over to the serial console screen and see if there’s any kernel error messages output. Try to copy any messages (or just a sample of repeating messages) you see into the notes section of the crash log. If there are no messages, the machine may just be really busy- wait a bit (5-10min) to see if it comes back. If it's still pinging, odds are its very busy. Note, if you see messages about swap space exhausted, the server is obviously out of memory, however it may recover briefly enough for you to get a jtop in to see who's lauched a ton of procs (most likely) and then issue a quick jailkill to get it back under control.

If it doesn't come back, or the messages indicate a fatal error, you will need to proceed with a power cycle (ctrl+alt+del will not work).

Power cycle the server

If this machine is not a Dell 2950 with a DRAC card (i.e. if you can’t ssh into the DRAC card (as root, using the standard root pass) and issue

racadm serveraction hardreset

then you will need someone at the data center power the macine off, wait 30 sec, then turn it back on. Make sure to re-attach via console:

tip jailX

immediately after power down.

(Re)attach to the console

Stay on the console the entire time during boot. As the BIOS posts- look out for the RAID card output- does everything look healthy? The output may be scrambled, look for "DEGRADED" or "FAILED". Once the OS starts booting you will be disconnected (dropped back to the shell on the console server) a couple times during the boot up. The reason you want to quickly re-attach is two-fold: 1. If you don’t reattach quickly then you won’t get any console output, 2. you want to be attached before the server potentially starts (an extensive) fsck. If you attach after the fsck begins, you’ll have seen no indication it started an fsck and the server will appear frozen during startup- no output, no response.

IMPORTANT NOTE: on some older FreeBSD systems, there will be no output to the video (KVM) console as it boots up. The console output is redirected to the serial port ... so if a jail crashes, and you attach a kvm, the output during the bootup procedure will not be shown on the screen. However, when the bootup is done, you will get a login prompt on the screen and will be able to log in as normal. /boot/loader.conf is where serial console redirect output lives, so comment that if you want to catch output on kvm. On newer systems it sends most output to both locations.

Assess the heath of the server

Once the server boots up fully, you should be able to ssh in. Look around- make sure all the mounts are there and reporting the correct size/usage (i.e. /mnt/data1 /mnt/data2 /mnt/data3 - look in /etc/fstab to determine which mount points should be there), check to see if RAID mirrors are healthy. See megacli, aaccheck

Before you start the jails, you need to run preboot. This will do some assurance checks to make sure things are prepped to start the jails. Any issues that come out of preboot need to be addressed before starting jails.

Start jails

More on starting jails Customer jails (the VPSs) do not start up automatically at boot time. When a FreeBSD machines boots up, it boots up, and does nothing else. To start jails, we put the commands to start each jail into a shell script(s) and run the script(s). Jail startup is something that needs to be actively monitored, which is why we don’t just run the script automatically.

In order to start jails, we run the quad files: quad1 quad2 quad3 and quad4 (on new systems there is only quad1). If the machine was cleanly rebooted- which wouldn't be the case if this was a crash, you may run the safe files (safe1 safe2 safe3 safe4) in lieu of quads.

Open up 4 logins to the server (use the windows in a9) In each of the 4 windows you will:

If there is a startalljails script (and only quad1), run that command in each of the 4 windows. It will parse through the quad1 file and start each jail. Follow the instructions here for monitoring startup. Note that you can be a little more lenient with jails that take awhile to start- startalljails will work around the slow jails and start the rest. As long as there aren't 4 jails which are "hung" during startup, the rest will get started eventually. -or- If there is no startalljails script, there will be multiple quad files. In each of the 4 windows, start each of the quads. i.e. start quad1 in window1, quad2 in window2 and so on. DO NOT start any quad twice. It will crash the server. If you accidentally do this, just jailkill all the jails which are in the quad and run the quad again. Follow the instructions here for monitoring quad startup.

Note the time the last jail boots- this is what you will enter in the crash log.

Save the crash log.

Check to make sure all jails have started

There's a simple script which will make sure all jails have started, and enter the ipfw counter rules: postboot Run postboot, which will do a jailps on each jail it finds (excluding commented out jails) in the quad file(s). We're looking for 2 things:

  1. systems spawning out of control or too many procs
  2. jails which haven't started

On 7.x and newer systems it will print out the problems (which jails haven't started) at the conclusion of postboot. On older systems you will need to watch closely to see if/when there's a problem, namely:

[hostname] doesnt exist on this server

When you get this message, it means one of 2 things: 1. the jail really didn't start: When a jail doesn't start it usually boils down to a problem in the quad file. Perhaps the path name is wrong (data1 vs data2) or the name of the vn/mdfile is wrong. Once this is corrected, you will need to run the commands from the quad file manually, or you may use startjail <hostname>

2. the customer has changed their hostname (and not told us) so their jail is running, just under a different hostname: On systems with jls, this is easy to rectify. First, get the customer info: g <hostname> Then look for the customer in jls: jls | grep <col0XXXX> From there you will see their new hostname- you should update that hostname in the quad file: don't forget to edit it on the ## begin ## and ## end ## lines, and in mgmt. On older systems without jls, this will be harder, you will need to look further to see their hostname- perhaps its in their /etc/rc.conf


Once all jails are started, do some spot checks- try to ssh or browse to some customers, just to make sure things are really ok.

Problems un-mounting - and with mount_null’s

If you cannot unmount a filesystem, beacuse it says the filesystem is busy, it is because of three things:

a) the jail is still running

b) you are actually in that directory, even though the jail is stopped

c) there are still dev, null_mount or linprocfs mount points mounted inside that directory.

d) when trying to umount null_mounts that are really long and you get an error like “No such file or directory”, it’s an OS bug where the dir is truncated. No known fix

e) there are still files open somewhere inside the dir. Use fstat | grep <cid> to find the process that has files open

f) Starting with 6.x, the jail mechanism does a poor job of keeping track of processes running in a jail and if it thinks there are still procs running, it will refuse to umount the disk. If this is happening you should see a low number in the #REF column when you run jls. In this case you can safely umount –f the mount.

Please note -if you forcibly unmount a (4.x) filesystem that has null_mounts still mounted in it, the system will crash within 10-15 mins.

FreeBSD VPS Management Tools

These files are located in /usr/local/jail/rc.d and /usr/local/jail/bin

jailmake

jailps

jailps [hostname]

DEPRECATED FOR jps: displays processes belonging to/running inside a jail. The command takes one (optional) argument – the hostname of the jail you wish to query. If you don’t supply an argument, all processes on the machine are listed and grouped by jail.

jps

jps [hostname]

displays processes belonging to/running inside a jail. The command takes one (optional) argument – the hostname or ID of the jail you wish to query.

jailkill

jailkill <hostname>

stops all process running in a jail.

You can also run:

jailkill <JID>

problems

Occasionally you will hit an issue where jail will not kill off:

jail9# jailkill www.domain.com
www.domain.com .. killed: none
jail9#

Because no processes are running under that hostname. You cannot use jailps.pl either:

jail9# jailps www.domain.com
www.domain.com doesn’t exist on this server
jail9#

The reasons for this are usually:

  • the jail is no longer running
  • the jail's hostname has changed

In this case,

>=6.x: run a jls|grep <jail's IP> to find the correct hostname, then update the quad file, then kill the jail.

<6.x: the first step is to cat their /etc/rc.conf file to see if you can tell what they set the new hostname to. This very often works. For example:

cat /mnt/data2/198.78.65.136-col00261-DIR/etc/rc.conf

But maybe they set the hostname with the hostname command, and the original hostname is still in /etc/rc.conf.

The welcome email clearly states that they should tell us if they change their hostname, so there is no problem in just emailing them and asking them what they set the new hostname to.

Once you know the new hostname OR if a customer simply emails to inform you that they have set the hostname to something different, you need to edit the quad and safe files that their system is in to input the new hostname.

However, if push comes to shove and you cannot find out the hostname from them or from their system, then you need to start doing some detective work.

The easiest thing to do is run jailps looking for a hostname similar to their original hostname. Or you could get into the /bin/sh shell by running:

/bin/sh

and then looking at every hostname of every process:

for f in `ls /proc` ; do cat /proc/$f/status ; done

and scanning for a hostname that is either similar to their original hostname, or that you don't see in any of the quad safe files.

This is very brute force though, and it is possible that catting every file in /proc is dangerous - I don't recommend it. A better thing would be to identify any processes that you know belong to this system – perhaps the reason you are trying to find this system is because they are running something bad - and just catting the status from only that PID.

Somewhere there’s a jail where there may be 2 systems named www. Look at /etc/rc.conf and make sure they’re both really www. If they are, jailkill www, jailps www to make sure not running. Then immediately restart the other one, as the fqdn (as found from a rev nslookup)

  • on >=6.x the hostname may not yet be hashed:
jail9 /# jls
 JID Hostname                    Path                                  IP Address(es)
   1 bitnet.dgate.org            /mnt/data1/69.55.232.50-col02094-DIR  69.55.232.50
   2 ns3.hctc.net                /mnt/data1/69.55.234.52-col01925-DIR  69.55.234.52
   3 bsd1                        /mnt/data1/69.55.232.44-col00155-DIR  69.55.232.44
   4 let2.bbag.org               /mnt/data1/69.55.230.92-col00202-DIR  69.55.230.92
   5 post.org                    /mnt/data2/69.55.232.51-col02095-DIR  69.55.232.51 ...
   6 ns2                         /mnt/data1/69.55.232.47-col01506-DIR  69.55.232.47 ...
   7 arlen.server.net            /mnt/data1/69.55.232.52-col01171-DIR  69.55.232.52
   8 deskfood.com                /mnt/data1/69.55.232.71-col00419-DIR  69.55.232.71
   9 mirage.confluentforms.com   /mnt/data1/69.55.232.54-col02105-DIR  69.55.232.54 ...
  10 beachmember.com             /mnt/data1/69.55.232.59-col02107-DIR  69.55.232.59
  11 www.agottem.com             /mnt/data1/69.55.232.60-col02109-DIR  69.55.232.60
  12 sdhobbit.myglance.org       /mnt/data1/69.55.236.82-col01708-DIR  69.55.236.82
  13 ns1.jnielsen.net            /mnt/data1/69.55.234.48-col00204-DIR  69.55.234.48 ...
  14 ymt.rollingegg.net          /mnt/data2/69.55.236.71-col01678-DIR  69.55.236.71
  15 verse.unixlore.net          /mnt/data1/69.55.232.58-col02131-DIR  69.55.232.58
  16 smcc-mail.org               /mnt/data2/69.55.232.68-col02144-DIR  69.55.232.68
  17 kasoutsuki.w4jdh.net        /mnt/data2/69.55.232.46-col02147-DIR  69.55.232.46
  18 dili.thium.net              /mnt/data2/69.55.232.80-col01901-DIR  69.55.232.80
  20 www.tekmarsis.com           /mnt/data2/69.55.232.66-col02155-DIR  69.55.232.66
  21 vps.yoxel.net               /mnt/data2/69.55.236.67-col01673-DIR  69.55.236.67
  22 smitty.twitalertz.com       /mnt/data2/69.55.232.84-col02153-DIR  69.55.232.84
  23 deliver4.klatha.com         /mnt/data2/69.55.232.67-col02160-DIR  69.55.232.67
  24 nideffer.com                /mnt/data2/69.55.232.65-col00412-DIR  69.55.232.65
  25 usa.hanyuan.com             /mnt/data2/69.55.232.57-col02163-DIR  69.55.232.57
  26 daifuku.ppbh.com            /mnt/data2/69.55.236.91-col01720-DIR  69.55.236.91
  27 collins.greencape.net       /mnt/data2/69.55.232.83-col01294-DIR  69.55.232.83
  28 ragebox.com                 /mnt/data2/69.55.230.104-col01278-DIR 69.55.230.104
  29 outside.mt.net              /mnt/data2/69.55.232.72-col02166-DIR  69.55.232.72
  30 vps.payneful.ca             /mnt/data2/69.55.234.98-col01999-DIR  69.55.234.98
  31 higgins                     /mnt/data2/69.55.232.87-col02165-DIR  69.55.232.87 ...
  32 ozymandius                  /mnt/data2/69.55.228.96-col01233-DIR  69.55.228.96
  33 trusted.realtors.org        /mnt/data2/69.55.238.72-col02170-DIR  69.55.238.72
  34 jc1.flanderous.com          /mnt/data2/69.55.239.22-col01504-DIR  69.55.239.22
  36 guppylog.com                /mnt/data2/69.55.238.73-col00036-DIR  69.55.238.73
  40 haliohost.com               /mnt/data2/69.55.234.41-col01916-DIR  69.55.234.41 ...
  41 satyr.jorge.cc              /mnt/data1/69.55.232.70-col01963-DIR  69.55.232.70
jail9 /# jailkill satyr.jorge.cc
ERROR: jail_: jail "satyr,jorge,cc" not found

Note how it's saying satyr,jorge,cc is not found, and not satyr.jorge.cc.

The jail subsystem tracks things using comma-delimited hostnames. That is created every few hours:

jail9 /# crontab -l
0 0,6,12,18 * * * /usr/local/jail/bin/sync_jail_names

So if we run this manually:

jail9 /# /usr/local/jail/bin/sync_jail_names

Then kill the jail:

jail9 /# jailkill satyr.jorge.cc

successfully killed: satyr,jorge,cc

It worked.

jailpsall

jailpsall

will run a jailps on all jails configured in the quad files (this is different from jailps with no arguments as it won’t help you find a “hidden” system)

jailpsw

jailpsw

will run a jailps with an extra -w to provide wider output

jt (>=7.x)

jt

displays the top 20 processes on the server (the top 20 processes from top) and which jail owns them. This is very helpful for determining who is doing what when the server is very busy.

jtop (>=7.x)

jtop

a wrapper for top displaying processes on the server and which jail owns them. Constantly updates, like top.

jtop (<7.x)

jtop

displays the top 20 processes on the server (the top 20 processes from top) and which jail owns them. This is very helpful for determining who is doing what when the server is very busy.

stopjail

stopjail <hostname> [1]

this will jailkill, umount and vnconfig –u a jail. If passed an optional 2nd argument, it will not exit before umounting and un-vnconfig’ing in the event jailkill returns no processes killed. This is useful if you just want to umount and vnconfig –u a jail you’ve already killed. It is intelligent in that it won’t try to umount or vnconfig –u if it’s not necessary.

startjail

startjail <hostname>

this will start vnconfig, mount (including linprocfs and null-mounts), and start a jail. Essentially, it reads the jail’s relevant block from the right quad file and executes it. It is intelligent in that it won’t try to mount or vnconfig if it’s not necessary.

jpid

jpid <pid>

displays information about a process – including which jail owns it. It’s the equivalent of running cat /proc/<pid>/status

canceljail

canceljail <hostname> [1]

this will stop a jail (the equivalent of stopjail), check for backups (offer to remove them from the backup server and the backup.config), rename the vnfile, remove the dir, and edit quad/safe. If passed an optional 2nd argument, it will not exit upon failing to kill and processes owned by the jail. This is useful if you just want to cancel a jail which is already stopped.

jls

jls [-v]

Lists all jails running:

JID #REF IP Address      Hostname                     Path
 101  135 69.55.224.148   mail.pc9.org                 /mnt/data2/69.55.224.148-col01034-DIR
  1. REF is the number of references or procs(?) running

Running with -v will give you all IPs assigned to each jail (7.2 up)

JID #REF Hostname                     Path                                  IP Address(es)
 101  139 mail.pc9.org                 /mnt/data2/69.55.224.148-col01034-DIR 69.55.224.14869.55.234.85

startalljails

startalljails

7.2+ only. This will parse through quad1 and start all jails. It utilizes lockfiles so it won’t try to start a jail more than once- therefore multiple instances can be running in parallel without fear of starting a jail twice. If a jail startup gets stuck, you can ^C without fear of killing the script. IMPORTANT- before running startalljails you should make sure you ran preboot once as it will clear out all the lockfiles and enable startalljails to work properly.

aaccheck.sh

aaccheck.sh

displayes the output of container list and task list from aaccli

backup

backup

backup script called nightly to update jail scripts and do customer backups

buildsafe

buildsafe

creates safe files based on quads (automatically removing the fsck’s). This will destructively overwrite safe files

checkload.pl

checkload.pl

this was intended to be setup as a cronjob to watch processes on a jail when the load rises above a certain level. Not currently in use.

checkprio.pl

checkprio.pl

will look for any process (other than the current shell’s csh, sh, sshd procs) with a non-normal priority and normalize it

diskusagemon

diskusagemon <mount point> <1k blocks>

watches a mount point’s disk use, when it reaches the level specified in the 2nd argument, it exits. This is useful when doing a restore and you want to be paged as it’s nearing completion. Best used as: diskusagemon /asd/asd 1234; pagexxx

dumprestore

dumprestore <dumpfile>

this is a perl expect script which automatically enters ‘1’ and ‘y’. It seems to cause restore to fail to set owner permissions on large restores.

g

g <search>

greps the quad/safe files for the given search parameter

gather.pl

gather.pl

gathers up data about jails configured and writes to a file. Used for audits against the db

gb

gb <search>

greps backup.config for the given search parameter

gbg

gbg <search>

greps backup.config for the given search parameter and presents just the directories (for clean pasting)

ipfwbackup

ipfwbackup

writes ipfw traffic count data to a logfile

ipfwreset

ipfwreset

writes ipfw traffic count data to a logfile and resets counters to 0

js

js

output varies by OS version, but generally provides information about the base jail: - which vn’s are in use - disk usage - info about the contents of quads - the # of inodes represented by the jails contained in the group (133.2 in the example below), and how many jails per data mount, as well as subtotals - ips bound to the base machine but not in use by a jail - free gvinum volumes, or unused vn’s or used md’s

/usr/local/jail/rc.d/quad1:
        /mnt/data1 133.2 (1)
        /mnt/data2 1040.5 (7)
        total 1173.7 (8)
/usr/local/jail/rc.d/quad2:
        /mnt/data1 983.4 (6)
        total 983.4 (6)
/usr/local/jail/rc.d/quad3:
        /mnt/data1 693.4 (4)
        /mnt/data2 371.6 (3)
        total 1065 (7)
/usr/local/jail/rc.d/quad4:
        /mnt/data1 466.6 (3)
        /mnt/data2 882.2 (5)
        total 1348.8 (8)
/mnt/data1: 2276.6 (14)
/mnt/data2: 2294.3 (15)

Available IPs:
69.55.230.11 69.55.230.13 69.55.228.200

Available volumes:
v78 /mnt/data2 2G
v79 /mnt/data2 2G
v80 /mnt/data2 2G

load.pl

load.pl

feeds info to load mrtg - executed by inetd.

makevirginjail

makevirginjail

Only on some systems, makes an empty jail (doesn't do restore step)

mb

mb <mount|umount>

(nfs) mounts and umounts dirs to backup2. Shortcuts are mbm and mbu to mount and unmount.

notify.sh

notify.sh

emails reboot@johncompanies.com – intended to be called at boot time to alert us to a machine which panics and reboots and isn’t caught by bb or castle.

orphanedbackupwatch

orphanedbackupwatch

looks for directories on backup2 which aren’t configured in backup.config and offers to delete them

postboot

postboot

to be run after a machine reboot and quad/safe’s are done executing. It will:

  • do chmod 666 on each jail’s /dev/null
  • add ipfw counts
  • run jailpsall (so you can see if a configured jail isn’t running)

preboot

preboot

to be run before running quad/safe – checks for misconfigurations:

  • a jail configured in a quad but not a safe
  • a jail is listed more than once in a quad
  • the ip assigned to a jail isn’t configured on the machine
  • alias numbering skips in the rc.conf (resulting in the above)
  • orphaned vnfile's that aren't mentioned in a quad/safe
  • ip mismatches between dir/vnfile name and the jail’s ip
  • dir/vnfiles's in quad/safe that don’t exist

quadanalyze.pl

quadanalyze.pl

called by js, produces the info (seen above with js explanation) about the contents of quad (inode count, # of jails, etc.)

rsync.backup

rsync.backup

does customer backups (relies on backup.config)

taskdone

taskdone

when called will email support@johncompanies.com with the hostname of the machine from which it was executed as the subject

topten

topten

summarizes the top 10 traffic users (called by ipfwreset)

trafficgather.pl

trafficgather.pl [yy-mm]

sends a traffic usage summary by jail to support@johncomapnies.com and payments@johncompanies.com. Optional arguments are year and month (must be in the past). If not passed, assumes last month. Relies on traffic logs created by ipfwreset and ipfwbackup

trafficwatch.pl

trafficwatch.pl

checks traffic usage and emails support@johncomapnies.com when a jail reaches the warning level (35G) and the limit (40G). We really aren’t using this anymore now that we have netflow.

trafstats

trafstats

writes ipfw traffic usage info by jail to a file called jc_traffic_dump in each jail’s / dir

truncate_jailmake

truncate_jailmake

a version of jailmake which creates truncated vnfiles.

vb

vb

the equivalent of: vi /usr/local/jail/bin/backup.config

vs (freebsd)

vs<n>

the equivalent of: vi /usr/local/jail/rc.d/safe<n>

vq<n> the equivalent of: vi /usr/local/jail/rc.d/quad<n>

dumpremote

dumpremote <user@machine> </remote/location/file-dump> <vnX>

ex: dumpremote user@10.1.4.117 /mnt/data3/remote.echoditto.com-dump 7 this will dump a vn filesystem to a remote machine and location

oversellcheck

oversellcheck

displays how much a disk is oversold or undersold taking into account truncated vn files. Only for use on 4.x systems

mvbackups (freebsd)

mvbackups <dir> (1.1.1.1-col00001-DIR) <target_machine> (jail1) <target_dir> (data1)

moves backups from one location to another on the backup server, and provides you with option to remove entries from current backup.config, and simple paste command to add the config to backup.config on the target server

jailnice

jailnice <hostname>

applies renice 19 [PID] and rtprio 31 –[PID] to each process in the given jail

dumpremoterestore

dumpremoterestore <device> <ip of target machine> <dir on target machine>

ex: dumpremoterestore /dev/vn51 10.1.4.118 /mnt/data2/69.55.239.45-col00688-DIR dumps a device and restores it to a directory on a remote machine. Requires that you enable root ssh on the remote machine.

psj

psj

shows just the procs running on the base system – a ps auxw but without jail’d procs present

perc5iraidchk

perc5iraidchk

checks for degraded arrays on Dell 2950 systems with Perc5/6 controllers

perc4eraidchk

perc4eraidchk

checks for degraded arrays on Dell 2850 systems with Perc4e/Di controllers

Virtuozzo VPS Management Tools

vm

cancelve

bwcap

vzstat

vwe