Editing VPS Management (section)

= FreeBSD Jails =

== Starting jails: Quad/Safe Files ==

FreeBSD customer systems do not start up automatically at boot time.  When one of our freebsd machines boots up, it boots up, and does nothing else. To start jails, we put the commands to start each jail into a shell script(s) and run the script(s). Jail startup is something that needs to be actively monitored, which is why we don’t just run the script automatically. More on monitoring later.

NOTE: >=7.x we have moved to 1 quad file: <tt>quad1</tt>. Startups are not done by running each quad, but rather [[#startalljails|startalljails]] which relies on the contents of <tt>quad1</tt>. The specifics of this are lower in this article. What follows here applies for pre 7.x systems.

There are eight files in <tt>/usr/local/jail/rc.d</tt>:

<pre>jail3# ls /usr/local/jail/rc.d/
quad1   quad2   quad3   quad4   safe1   safe2   safe3   safe4
jail3#</pre>

four quad files and four safe files.

Each file contains an even number of system startup blocks (total number of jails divided by 4)
 
The reason for this is, if we make one large script to startup all the systems at boot time, it will take too long - the first system in the script will start up right after system boot, which is great, but the last system may not start for another 20 minutes.

Since there is no way to parralelize this during the startup procedure, we simply open four terminals (in screen window 9) and run each script, one in each terminal. This way they all run simultaneously, and the very last system in each startup script gets started in 1/4th the time it would if there was one large file

The files are generally organized so that quad/safe 1&2 have only jails from disk 1, and quad/safe 3&4 have jails from disk 2. This helps ensure that only 2 fscks on any disk are going on at once. Further, they are balanced so that all quad/safe’s finish executing around the same time. We do this by making sure each quad/safe has a similar number of jails  and represents a similar number of inodes (see js).

The other, very important reason we do it this way, and this is the reason there are quad files and safe files, is that in the event of a system crash, every single vn-backed filesystem that was mounted at the time of system crash needs to be fsck'd.  However, fsck'ing takes time, so if we shut the system down gracefully, we don't want to fsck.

Therefore, we have two sets of scripts - the four quad scripts are identical to the four safe scripts except for the fact that the quad scripts contain fsck commands for each filesystem.

So, if you shut a system down gracefully, start four terminals and run safe1 in window one, and safe2 in window 2, and so on.
 
If you crash, start four terminals (or go to screen window 9) and run quad1 in window one, and quad2 in window 2, and so on.

Here is a snip of (a 4.x version) quad2 from jail17:

<pre>vnconfig /dev/vn16 /mnt/data2/69.55.228.7-col00820
fsck -y /dev/vn16
mount /dev/vn16c /mnt/data2/69.55.228.7-col00820-DIR
chmod 0666 /mnt/data2/69.55.228.7-col00820-DIR/dev/null
jail /mnt/data2/69.55.228.7-col00820-DIR mail1.phimail.com 69.55.228.7 /bin/sh /etc/rc

# moved to data2 col00368
#vnconfig /dev/vn28 /mnt/data2/69.55.236.132-col00368
#fsck -y /dev/vn28
#mount /dev/vn28c /mnt/data2/69.55.236.132-col00368-DIR
#chmod 0666 /mnt/data2/69.55.236.132-col00368-DIR/dev/null
#jail /mnt/data2/69.55.236.132-col00368-DIR limehouse.org 69.55.236.132 /bin/sh /etc/rc

echo ‘### NOTE ### ^C @ Local package initialization: pgsqlmesg: /dev/ttyp1: Operation not permitted’
vnconfig /dev/vn22 /mnt/data2/69.55.228.13-col01063
fsck -y /dev/vn22
mount /dev/vn22c /mnt/data2/69.55.228.13-col01063-DIR
chmod 0666 /mnt/data2/69.55.228.13-col01063-DIR/dev/null
jail /mnt/data2/69.55.228.13-col01063-DIR www.widestream.com.au 69.55.228.13 /bin/sh /etc/rc

# cancelled col00106
#vnconfig /dev/vn15 /mnt/data2/69.55.238.5-col00106
#fsck -y /dev/vn15
#mount /dev/vn15c /mnt/data2/69.55.238.5-col00106-DIR
#chmod 0666 /mnt/data2/69.55.238.5-col00106-DIR/dev/null
#jail /mnt/data2/69.55.238.5-col00106-DIR mail.azebu.net 69.55.238.5 /bin/sh /etc/rc</pre>

As you can see, two of the systems specified are commented out - presumably those customers cancelled, or were moved to new servers.

As you can see, the vnconfig line is the simpler command line, not the longer one that was used when it was first configured.  As you can see, all that is done is, vnconfig the filesystem, then fsck it, then mount it. The fourth command is the `jail` command used to start the system – but that will be covered later.

Here is the safe2 file from jail17:

<pre>vnconfig /dev/vn16 /mnt/data2/69.55.228.7-col00820
mount /dev/vn16c /mnt/data2/69.55.228.7-col00820-DIR
chmod 0666 /mnt/data2/69.55.228.7-col00820-DIR/dev/null
jail /mnt/data2/69.55.228.7-col00820-DIR mail1.phimail.com 69.55.228.7 /bin/sh /etc/rc

# moved to data2 col00368
#vnconfig /dev/vn28 /mnt/data2/69.55.236.132-col00368
#mount /dev/vn28c /mnt/data2/69.55.236.132-col00368-DIR
#chmod 0666 /mnt/data2/69.55.236.132-col00368-DIR/dev/null
#jail /mnt/data2/69.55.236.132-col00368-DIR limehouse.org 69.55.236.132 /bin/sh /etc/rc

echo ‘### NOTE ### ^C @ Local package initialization: pgsqlmesg: /dev/ttyp1: Operation not permitted’
vnconfig /dev/vn22 /mnt/data2/69.55.228.13-col01063
mount /dev/vn22c /mnt/data2/69.55.228.13-col01063-DIR
chmod 0666 /mnt/data2/69.55.228.13-col01063-DIR/dev/null
jail /mnt/data2/69.55.228.13-col01063-DIR www.widestream.com.au 69.55.228.13 /bin/sh /etc/rc

# cancelled col00106
#vnconfig /dev/vn15 /mnt/data2/69.55.238.5-col00106
#mount /dev/vn15c /mnt/data2/69.55.238.5-col00106-DIR
#chmod 0666 /mnt/data2/69.55.238.5-col00106-DIR/dev/null
#jail /mnt/data2/69.55.238.5-col00106-DIR mail.azebu.net 69.55.238.5 /bin/sh /etc/rc</pre>

As you can see, it is exactly the same, but it does not have the fsck lines.

Take a look at the last entry - note that the file is named:

 /mnt/data2/69.55.238.5-col00106

and the mount point is named:

 /mnt/data2/69.55.238.5-col00106-DIR

This is the general format on all the FreeBSD systems.  The file is always named:

 IP-custnumber

and the directory is named:

 IP-custnumber-DIR

If you run safe when you need a fsck, the mount will fail and jail will fail:

 # mount /dev/vn1c /mnt/data2/jails/65.248.2.131-ns1.kozubik.com-DIR
 mount: /dev/vn1c: Operation not permitted

No reboot needed, just run the quad script

Starting with 6.x jails, we added block delimiters to the quad/safe files, the block looks like:

<pre>echo '## begin ##: nuie.solaris.mu'
fsck -y /dev/concat/v30v31a
mount /dev/concat/v30v31a /mnt/data1/69.55.228.218-col01441-DIR
mount_devfs devfs /mnt/data1/69.55.228.218-col01441-DIR/dev
devfs -m /mnt/data1/69.55.228.218-col01441-DIR/dev rule -s 3 applyset
jail /mnt/data1/69.55.228.218-col01441-DIR nuie.solaris.mu 69.55.228.218 /bin/sh /etc/rc
echo '## end ##: nuie.solaris.mu'</pre>

These are more than just informative when running quad/safe’s, the echo lines MUST be present for certain tools to work properly. So it’s important that any updates to the hostname also be updated on the 2 echo lines. For example, if you try to startjail a jail with a hostname which is on the jail line but not the echo lines, the command will return with host not found.

=== FreeBSD 7.x+ notes ===

Starting with the release of FreeBSD 7.x, we are doing jail startups in a slightly different way. First, thereis only 1 file: <tt>/usr/local/jail/rc.d/quad1</tt>
There are no other quads or corresponding safe files. The reason for this is twofold, 1. We can pass –C to fsck which will tell is to skip the fsck if the fs is clean (no more need for safe files), 2. We have a new startup script which can be launched multiple times, running in parallel to start jails, where quad1 is the master jail file. 
Quad1 could still be run as a shell script, but it would take a very long time for it to run completely so it’s not advisable; or you should break it down into smaller chunks (like quad1, quad2, quad3, etc)

Here is a snip of (a 7.x version) quad1 from jail2:

<pre>echo '## begin ##: projects.tw.com'
mdconfig -a -t vnode -f /mnt/data1/69.55.230.46-col01213 -u 50
fsck -Cy /dev/md50c
mount /dev/md50c /mnt/data1/69.55.230.46-col01213-DIR
mount -t devfs devfs /mnt/data1/69.55.230.46-col01213-DIR/dev
devfs -m /mnt/data1/69.55.230.46-col01213-DIR/dev rule -s 3 applyset
jail /mnt/data1/69.55.230.46-col01213-DIR projects.tw.com 69.55.230.46 /bin/sh /etc/rc
echo '## end ##: projects.tw.com'</pre>

Cancelled jails are no longer commented out and stored in quad1, rather they’re moved to <tt>/usr/local/jail/rc.d/deprecated</tt> 

To start these jails, start the 4 ssh sessions as you would for a normal crash and then instead of running quad1-4, instead run startalljails in each window. IMPORTANT- before running startalljails you should make sure you ran preboot once as it will clear out all the lockfiles and enable startalljails to work properly.

== Problems with the quad/safe files ==

When you run the quad/safe files, there are two problems that can occur - either a particular system will hang during initialization, OR a system will spit out output to the screen, impeding your ability to do anything.  Or both.

First off, when you start a jail, you see output like this:

<pre>Skipping disk checks ...
adjkerntz[25285]: sysctl(put_wallclock): Operation not permitted
Doing initial network setup:.
ifconfig: ioctl (SIOCDIFADDR): permission denied
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
Additional routing options: TCP keepalive=YESsysctl:
net.inet.tcp.always_keepalive: Operation not permitted.
Routing daemons:.
Additional daemons: syslogd.
Doing additional network setup:.
Starting final network daemons:.
ELF ldconfig path: /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib
a.out ldconfig path: /usr/lib/aout /usr/lib/compat/aout /usr/X11R6/lib/aout
Starting standard daemons: inetd cron sshd sendmail sendmail-clientmqueue.
Initial rc.i386 initialization:.
Configuring syscons: blanktime.
Additional ABI support:.
Local package initialization:.
Additional TCP options:.</pre>

Now, let's look at this line, near the end:

 Local package initialization:.

This is where a list of daemons that are set to start at boot time willshow up.  You might see something like:

 Local package initialization: mysqld apache sendmail sendmail-clientmqueue

Or something like this:

 Local package initialization: postgres postfix apache

The problem is that many systems (about 4-5 per machine) will hang on that line.  Basically it will get to some of the way through the total daemons to be started:

 Local package initialization: mysqld apache

and will just sit there.  Forever.

Fortunately, pressing ctrl-c will break out of it.  Not only will it break out of it, but it will also continue on that same line and start the other daemons:

 Local package initialization: mysqld apache ^c sendmail-clientmqueue

and then continue on to finish the startup, and then move to the next system to be started.

So what does this mean?  It means that if a machine crashes, and you start four screen-windows to run four quads or four safes, you need to periodically cycle between them and see if any systems are stuck at that point, causing their quad/safe file to hang.  A good rule of thumb is, if you see a system at that point in the startup, give it another 100 seconds - if it is still at the exact same spot, hit ctrl-c. Its also a good idea to go back into the quad file (just before the first command in the jail startup block) and note that this jail tends to need a control-c or more time as follows:
<pre>echo '### NOTE ### slow sendmail'
echo '### NOTE ###: ^C @ Starting sendmail.'</pre>

'''NEVER''' hit ctrl-c repeatedly if you don't get an immediate response - that will cause the following jail’s startup commands to be aborted.

A second problem that can occur is that a jail - maybe the first one in that particular quad/safe, maybe the last one, or maybe one in the middle, will start spitting out status or error messages from one of its init scripts.  This is not a problem - basically, hit enter a few times and see if you get a prompt - if you do get a prompt, that means that the quad/safe script has already completed.  Therefore it is safe to log out (and log out of the user that you su'd from) and then log back in (if necessary).

The tricky thing is, if a system in the middle starts flooding with messages, and you hit enter a few times and don't get a prompt.  Are you not getting a prompt because some subsequent system is hanging at the initialization, as we discussede above ?  Or are you not getting a prompt because that quad file is currently running an fsck ?  Usually you can tell by scrolling back in screen’s history to see what it was doing before you started getting the messages.

If you don’t get clues from history, you have to use your judgement - instead of giving it 100 seconds to respond, perhaps give it 2-3 mins ... if you still get no response (no prompt) when you hit enter, hit ctrl-c.  However, be aware that you might still be hitting ctrl-c in the middle of an fsck.  This means you will get an error like "filesystem still marked dirty" and then the vnconfig for it will fail and so will the jail command, and the next system in the quad file will then start starting up.

If this happens, just wait until the end of all the quad files have finished, and start that system manually.

If things really get weird, like a screen flooded with errors, and you can't get a prompt, and ctrl-c does nothing, then you need to just eventually (give it ten mins or so) just kill that window with ctrl-p, then k, and then log in again and manually check which systems are now running and which aren't, and manually start up any that are not.

Don't EVER risk running a particular quad/safe file a second time.
If the quad/safe script gets executed twice, reboot the machine immediately.

So, for all the above reasons, anytime a machine crashes and you run all the quads or all the safes, '''always''' check every jail afterwards to make sure it is running - even if you have no hangs or complications at all.
Run this command:

<tt>[[#jailpsall|jailpsall]]</tt> 

Note: [[#postboot|postboot]] also populates ipfw counts, so it '''should not be run multiple times''',  use <tt>jailpsall</tt> for subsequent extensive ps’ing

And make sure they all show as running.  If one does not show as running, check its /etc/rc.conf file first to see if maybe it is using a different hostname first before starting it manually.

One thing we have implemented to alleviate these startup hangs and noisy jails, is to put jail start blocks that are slow or hangy at the bottom of the safe/quad file. Further, for each bad jail we note in each quad/safe just before the start block something like:

 echo ‘### NOTE ### ^C @ Local package initialization: pgsqlmesg: /dev/ttyp1: Operation not permitted’

That way we’ll be prepared to ^C when we see that message appear during the quad/safe startup process. If you observe a new, undocumented hang, '''after''' the quad/safe has finished, place a line similar to the above in the quad file, move the jail start block to the end of the file, then run [[#buildsafe|buildsafe]]


== Making new customer jail 4.x ==

1. run [[#js|js]] to figure out which partition, IP to put it on, find an unused vn, and choose which quad it should go in


2. use col00xxx for both hostnames if they don’t give you a hostname


3. copy over dir and ip to pending customer screen


Usage: <tt>jailmake IP filepath vnX hostname shorthost quad/safe# ipfw# email [gb disk, default 4]</tt>

 jail14# /tmp/jailmake 69.55.226.152 /mnt/data2/69.55.226.152-col00182 vn23 negev.cerastes.org negev 4 182 cerastes@cerastes.org


== Making new customer jail 6.x ==

1. run [[#js|js]] to figure out which gvinum vols are free (and which mnt (disk) each gvinum goes with/to), IP to put it on and choose which quad it should go in


2. use col00xxx for both hostnames if they don’t give you a hostname


3. copy over dir and ip to pending customer screen


Usage: <tt>jailmake IP filepath vN[,vN] hostname shorthost quad/safe# ipfw# email</tt>

 jail19# jailmake 69.55.236.17 /mnt/data1/69.55.236.17-col01574 v10 uncle-leo.commadev.com uncle-leo 1 1574 lee@commadev.com,lee@gmail.com


== Changing an ip for freebsd VPS ==

*stopjail (hostname)
*on systems using vnfile or mdfile: rename vnfile/mdfile, using new ip
*rename dir using new ip
*Edit quad (make sure to update all lines: directory, vn/mdfile)
*buildsafe (not necessary on systems having only quad1)
*if you're adding an ip not already on the host machine: <tt>ipadd x.x.x.x</tt>
*startjail (hostname)
if backups: <tt>mvbackups</tt> ...
*Edit dir/ip in mgmt
*if any rules, update any firewall rules to use new ip (careful not to make sure there arent multiple rules per ipfw#- search by colo#)
*(if customer asks and has domains) update their domains on ns1c
*optional: update ipfw counters on the host server


== Rename a gconcat vol ==

*stop jail, unmount everything
*gconcat stop vol
*gconcat clear /dev/gvinum/vN
*gconcat clear /dev/gvinum/vN+1
*gconcat label -v newname /dev/gvinum/vN /dev/gvinum/vN+1
*bsdlabel /dev/concat/newname
(make sure a partition is not fstype unused, if so change to: 4.2BSD)


== Remaking a VPS (on same jail) ==

# take him down (stopjail)
# make note of host, vn/gvin/md, ip
# remove from quad/safe
# move vnfile (rename)
# remove dir (jailmake will make a new one)
# remove ipfw counts
# jailmake
# update backup script
# update db with new dir/ip (use the “move” link, mark old as stopped today)
# update firewall if changed ip

== Recovering from a crash (FreeBSD) ==

=== Diagnose whether you have a crash ===
The most important thing is to get the machine and all jails back up as soon as possible. Note the time, you’ll need to create a crash log entry (Mgmt. -> Reference -> CrashLog). The first thing to do is head over to the [[Screen#Screen_Organization|serial console screen]] and see if there’s any kernel error messages output. Try to copy any messages (or just a sample of repeating messages) you see into the notes section of the crash log. If there are no messages, the machine may just be really busy- wait a bit (5-10min) to see if it comes back. If it's still pinging, odds are its very busy. Note, if you see messages about swap space exhausted, the server is obviously out of memory, however it may recover briefly enough for you to get a jtop in to see who's lauched a ton of procs (most likely) and then issue a quick jailkill to get it back under control.

If it doesn't come back, or the messages indicate a fatal error, you will need to proceed with a power cycle (ctrl+alt+del will not work).

=== Power cycle the server ===
If this machine is not a Dell 2950 with a [[DRAC/RMM#DRAC|DRAC card]] (i.e. if you can’t ssh into the DRAC card (as root, using the standard root pass) and issue 
 racadm serveraction hardreset
then you will need someone at the data center power the macine off, wait 30 sec, then turn it back on.  Make sure to re-attach via console:
 tip jailX
immediately after power down. 

=== (Re)attach to the console ===
Stay on the console the entire time during boot. As the BIOS posts- look out for the RAID card output- does everything look healthy? The output may be scrambled, look for "DEGRADED" or "FAILED". Once the OS starts booting you will be disconnected (dropped back to the shell on the console server) a couple times during the boot up. The reason you want to quickly re-attach is two-fold: 1. If you don’t reattach quickly then you won’t get any console output, 2. you want to be attached before the server ''potentially'' starts (an extensive) fsck. If you attach after the fsck begins, you’ll have seen no indication it started an fsck and the server will appear frozen during startup- no output, no response. 

IMPORTANT NOTE: on some older FreeBSD systems, there will be no output to the video (KVM) console as it boots up. The console output is redirected to the serial port ... so if a jail crashes, and you attach a kvm, the output during the bootup procedure will not be shown on the screen. However, when the bootup is done, you will get a login prompt on the screen and will be able to log in as normal.  <tt>/boot/loader.conf</tt> is where serial console redirect output lives, so comment that if you want to catch output on kvm.
On newer systems it sends most output to both locations. 

=== Assess the heath of the server ===
Once the server boots up fully, you should be able to ssh in. Look around- make sure all the mounts are there and reporting the correct size/usage (i.e. /mnt/data1 /mnt/data2 /mnt/data3 - look in /etc/fstab to determine which mount points should be there), check to see if RAID mirrors are healthy. See [[RAID_Cards#Common_CLI_commands_.28megacli.29|megacli]], [[#aaccheck|aaccheck]]

Before you start the jails, you need to run [[#preboot|preboot]]. This will do some assurance checks to make sure things are prepped to start the jails. Any issues that come out of preboot need to be addressed before starting jails.

=== Start jails ===
[[#Starting_jails:_Quad.2FSafe_Files|More on starting jails]]
Customer jails (the VPSs) do not start up automatically at boot time. When a FreeBSD machines boots up, it boots up, and does nothing else. To start jails, we put the commands to start each jail into a shell script(s) and run the script(s). Jail startup is something that needs to be actively monitored, which is why we don’t just run the script automatically. 

In order to start jails, we run the quad files: quad1 quad2 quad3 and quad4 (on new systems there is only quad1). If the machine was cleanly rebooted- which wouldn't be the case if this was a crash, you may run the safe files (safe1 safe2 safe3 safe4) in lieu of quads. 

Open up 4 logins to the server (use the windows in [[Screen#Screen_Organization|a9]])
In each of the 4 windows you will:

If there is a [[#startalljails|startalljails]] script (and only quad1), run that command in each of the 4 windows. It will parse through the quad1 file and start each jail. Follow the instructions [[#Problems_with_the_quad.2Fsafe_files|here]] for monitoring startup. Note that you can be a little more lenient with jails that take awhile to start- startalljails will work around the slow jails and start the rest. As long as there aren't 4 jails which are "hung" during startup, the rest will get started eventually.
	-or-
If there is no startalljails script, there will be multiple quad files. In each of the 4 windows, start each of the quads. i.e. start quad1 in window1, quad2 in window2 and so on. DO NOT start any quad twice. It will crash the server. If you accidentally do this, just jailkill all the jails which are in the quad and run the quad again. Follow the instructions here for monitoring quad startup.

Note the time the last jail boots- this is what you will enter in the crash log.

Save the crash log.

=== Check to make sure all jails have started ===
There's a simple script which will make sure all jails have started, and enter the ipfw counter rules: [[#postboot|postboot]] 
Run postboot, which will do a jailps on each jail it finds (excluding commented out jails) in the quad file(s). We're looking for 2 things:
# systems spawning out of control or too many procs
# jails which haven't started
On 7.x and newer systems it will print out the problems (which jails haven't started) at the conclusion of postboot. 
On older systems you will need to watch closely to see if/when there's a problem, namely:
 
 [hostname] doesnt exist on this server

When you get this message, it means one of 2 things:
1. the jail really didn't start:
When a jail doesn't start it usually boils down to a problem in the quad file. Perhaps the path name is wrong (data1 vs data2) or the name of the vn/mdfile is wrong. Once this is corrected, you will need to run the commands from the quad file manually, or you may use <tt>startjail <hostname></tt>

2. the customer has changed their hostname (and not told us) so their jail ''is'' running, just under a different hostname:
On systems with jls, this is easy to rectify. First, get the customer info: <tt>g <hostname></tt>
Then look for the customer in jls: <tt>jls | grep <col0XXXX></tt>
From there you will see their new hostname- you should update that hostname in the quad file: don't forget to edit it on the <tt>## begin ##</tt> and <tt>## end ##</tt> lines, and in mgmt. 
On older systems without jls, this will be harder, you will need to look further to see their hostname- perhaps its in their /etc/rc.conf


Once all jails are started, do some spot checks- try to ssh or browse to some customers, just to make sure things are really ok.

== Adding disk to a 7.x/8.x jail ==
or
== Moving customer to a different drive (md) ==

NOTE: this doesn’t apply to mx2 which uses gvinum. Use same procedure as 6.x
NOTE: if you unmount before mdconfig, re-mdconfig (attach) then unmount then mdconfig -u again 

 
(parts to change/customize are <tt><span style="color:red">red</span></tt>)


If someone wants more disk space, there’s a paste for it, send it to them (explains about downtime, etc).

1. Figure out the space avail from <tt>js</tt>. Ideally, you want to put the customers new space on a different partition (and create the new md on the new partition). 


2. make a mental note of how much space they're currently using


3. <tt>jailkill <hostname></tt> 


4. Umount it (including their devfs) but leave the md config’d (so if you use stopjail, you will have to re-mdconfig it)


5. <tt>g <customerID></tt> to get the info (IP/cust#) needed to feed the new mdfile and mount name, and to see the current md device. 


6a. When there's enough room to place new system on an alternate, or the same drive:
USE CAUTION not to overwrite (touch, mdconfig) existing md!!<br>
<tt>touch /mnt/data<span style="color:red">3/69.55.234.66-col01334</span><br>
mdconfig -a -t vnode -s 10g -f /mnt/data3/69.55.234.66-col01334 -u 97<br>
newfs /dev/md97</tt>


Optional- if new space is on a different drive, move the mount point directory AND use that directory in the mount and cd commands below:<br>
<tt>mv /mnt/data<span style="color:red">1/69.55.234.66-col01334-DIR</span> /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt>

<tt>mount /dev/md<span style="color:red">97</span> /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span> <br>
cd /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt> 


confirm you are mounted to /dev/md<span style="color:red">97</span> and space is correct:<br>
<tt>df .</tt> 


do the dump and pipe directly to restore:<br>
<tt>dump -0a -f - /dev/md<span style="color:red">1</span> | restore -r -f - <br>
rm restoresymtable</tt>


when dump/restore completes successfully, use <tt>df</tt> to confirm the restored data size matches the original usage figure


md-unconfig old system:<br>
<tt>mdconfig -d -u <span style="color:red">1</span></tt>


archive old mdfile. <br>
<tt>mv /mnt/data<span style="color:red">1/69.55.237.26-col00241</span> /mnt/data<span style="color:red">1/old-col00241-mdfile-noarchive-20091211</span></tt>


edit quad (vq1) to point to new (/mnt/data<span style="color:red">3</span>) location AND new md number (md<span style="color:red">97</span>)


restart the jail:<br>
<tt>startjail <hostname></tt>


6b. When there's not enough room on an alternate partition, or on the same drive...but there is enough room if you were to remove the existing customer's space:


mount backup nfs mounts:<br>
<tt>mbm</tt><br> 
(run <tt>df</tt> to confirm backup mounts are mounted)


dump the customer to backup2 or backup1<br>
<tt>dump -0a -f /backup<span style="color:red">4/col00241.20120329.noarchive.dump</span> /dev/md<span style="color:red">1</span></tt><br>
(when complete WITHOUT errors, <tt>du</tt> the dump file to confirm it matches size, roughly, with usage)


unconfigure and remove old mdfile<br>
<tt>mdconfig -d -u <span style="color:red">1</span><br>
rm /mnt/data<span style="color:red">1/69.55.237.26-col00241</span></tt><br>
(there should now be enough space to recreate your bigger system. If not, run sync a couple times)


create the new system (ok to reuse old mdfile and md#):<br>
<tt>touch /mnt/data<span style="color:red">1/69.55.234.66-col01334</span><br>
mdconfig -a -t vnode -s <span style="color:red">10</span>g -f /mnt/data<span style="color:red">1/69.55.234.66-col01334</span> -u <span style="color:red">1</span><br>
newfs /dev/md<span style="color:red">1</span><br>
mount /dev/md<span style="color:red">1</span> /mnt/data<span style="color:red">1/69.55.234.66-col01334-DIR</span> <br>
cd /mnt/data<span style="color:red">1/69.55.234.66-col01334-DIR</span></tt> 


confirm you are mounted to /dev/md<span style="color:red">1</span> and space is correct:<br>
<tt>df .</tt> 


do the restore from the dumpfile on the backup server:<br>
<tt>restore -r -f /backup<span style="color:red">4/col00241.20120329.noarchive.dump</span> .<br>
rm restoresymtable</tt>


when dump/restore completes successfully, use df to confirm the restored data size matches the original usage figure


umount nfs:<br>
<tt>mbu</tt>


If md# changed (or mount point), edit quad (<tt>vq1</tt>) to point to new (/mnt/data<span style="color:red">3</span>) location AND new md number (md<span style="color:red">1</span>)


restart the jail:<br>
<tt>startjail <hostname></tt>


7. update disk (and dir if applicable) in mgmt screen


8. update backup list AND move backups, if applicatble

Ex: <tt>mvbackups <span style="color:red">69.55.237.26-col00241</span> jail<span style="color:red">9</span> data<span style="color:red">3</span></tt>


9. Optional: archive old mdfile

<tt>mbm<br>
gzip -c old-col01588-mdfile-noarchive-20120329 > /deprecated/old-col01588-mdfile-noarchive-20120329.gz<br>
mbu<br>
rm  old-col01588-mdfile-noarchive-20120329</tt>

== Adding disk to a 6.x jail (gvinum/gconcat) ==
or
== Moving customer to a different drive (gvinum/gconcat) ==

(parts to change are <span style="color:red">highlighted</span>)


If someone wants more disk space, there’s a paste for it, send it to them (explains about downtime, etc).


1. Figure out the space avail from [[#js|js]]. Ideally, you want to put the customers new space on a different partition (and create the new md on the new partition). 


2. make a mental note of how much space they're currently using


3. <tt>[[#stopjail|stopjail]] <hostname></tt> 


4. <tt>[[#g|g]] <customerID></tt> to get the info (IP/cust#) needed to feed the new mount name and existing volume/device. 


5a. When there's enough room to place new system on an alternate, or the same drive (using only UNUSED - including if it's in use by the system in question - gvinum volumes):


Configure the new device:<br>
A. for a 2G system (single gvinum volume):<br>
<tt>bsdlabel -r -w /dev/gvinum/v<span style="color:red">123</span><br>
newfs /dev/gvinum/v<span style="color:red">123</span>a</tt><br>

-or- 


B. for a >2G system (create a gconcat volume):<br>
try to grab a contiguous block of gvinum volumes. gconcat volumes MAY NOT span drives (i.e. you cannot use a gvinum volume from data3 and a volume from data2 in the same gconcat volume). 

<tt>gconcat label <span style="color:red">v82-v84 /dev/gvinum/v82 /dev/gvinum/v83 /dev/gvinum/v84</span><br>
bsdlabel -r -w /dev/concat/<span style="color:red">v82-v84</span><br>
newfs /dev/concat/<span style="color:red">v82-v84</span>a</tt>


Other valid gconcat examples:<br>
<tt>gconcat label v82-v84v109v112 /dev/gvinum/v82 /dev/gvinum/v83 /dev/gvinum/v84 /dev/gvinum/v109 /dev/gvinum/v112<br>
gconcat label v82v83 /dev/gvinum/v82 /dev/gvinum/v83</tt> <br>
Note, long names will truncate: v144v145v148-v115 will truncate to v144v145v148-v1 (so you will refer to it as v144v145v148-v1 thereafter)


Optional- if new volume is on a different drive, move the mount point directory (get the drive from js output) AND use that directory in the mount and cd commands below:<br>
<tt>mv /mnt/data<span style="color:red">1/69.55.234.66-col01334-DIR</span> /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt>


confirm you are mounted to the device (<tt>/dev/gvinum/v<span style="color:red">123</span>a</tt> OR <tt>/dev/concat/<span style="color:red">v82-v84</span></tt>) and space is correct:<br>
A. <tt>mount /dev/gvinum/v<span style="color:red">123</span>a /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt> <br>
-or-<br>
B. <tt>mount /dev/concat/<span style="color:red">v82-v84</span>a /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt> 

<tt>cd /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span> <br>
df .</tt> 


do the dump and pipe directly to restore:<br>
<tt>dump -0a -f - /dev/gvinum/v<span style="color:red">1</span> | restore -r -f - <br>
rm restoresymtable</tt>


when dump/restore completes successfully, use df to confirm the restored data size matches the original usage figure


edit quad (<tt>vq<span style="color:red">1</span></tt>) to point to new (/mnt/data<span style="color:red">3</span>) location AND new volume (<tt>/dev/gvinum/v<span style="color:red">123</span>a</tt> or <tt>/dev/concat/<span style="color:red">v82-v84</span>a</tt>) , run <tt>buildsafe</tt>


restart the jail:<br>
<tt>startjail <hostname></tt>


5b. When there's not enough room on an alternate partition, or on the same drive...but there is enough room if you were to remove the existing customer's space (i.e. if you want/need to reuse the existing gvinum volumes and add on more):


mount backup nfs mounts:<br>
<tt>mbm</tt> <br>
(run df to confirm backup mounts are mounted)


dump the customer to backup2 or backup1<br>
<tt>dump -0a -f /backup<span style="color:red">4/col00241.20120329.noarchive.dump</span> /dev/<span style="color:red">gconcat/v106-v107</span></tt><br>
(when complete WITHOUT errors, du the dump file to confirm it matches size, roughly, with usage)


unconfigure the old gconcat volume<br>
list member gvinum volumes:

<tt>gconcat list <span style="color:red">v106v107</span></tt>


Output will resemble:<br>
<pre>Geom name: v106v107
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
ID: 3530663882
Providers:
1. Name: concat/v106v107
   Mediasize: 4294966272 (4.0G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: gvinum/sd/v106.p0.s0
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r1w1e3
   Start: 0
   End: 2147483136
2. Name: gvinum/sd/v107.p0.s0
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Mode: r1w1e3
   Start: 2147483136
   End: 4294966272</pre>

stop volume and clear members<br>
<tt>gconcat stop <span style="color:red">v106v107</span><br>
gconcat clear <span style="color:red">gvinum/sd/v106.p0.s0 gvinum/sd/v107.p0.s0</span></tt>


create new device- and its ok to reuse old/former members<br>
try to grab a contiguous block of gvinum volumes. gconcat volumes MAY NOT span drives (i.e. you cannot use a gvinum volume from data3 and a volume from data2 in the same gconcat volume). <br>
<tt>gconcat label <span style="color:red">v82-v84v106v107 /dev/gvinum/v82 /dev/gvinum/v83 /dev/gvinum/v84 /dev/gvinum/v106 /dev/gvinum/v107</span> <br>
bsdlabel -r -w /dev/concat/<span style="color:red">v82-v84v106v107</span><br>
newfs /dev/concat/<span style="color:red">v82-v84v106v107</span>a</tt>


Optional- if new volume is on a different drive, move the mount point directory (get the drive from js output) AND use that directory in the mount and cd commands below:<br>
<tt>mv /mnt/data<span style="color:red">1/69.55.234.66-col01334-DIR</span> /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt>

confirm you are mounted to the device (/dev/concat/<span style="color:red">v82-v84v106v107</span>) and space is correct:<br>
<tt>mount /dev/concat/<span style="color:red">v82-v84v106v107</span>a /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span></tt> 

<tt>cd /mnt/data<span style="color:red">3/69.55.234.66-col01334-DIR</span> <br>
df .</tt> 

do the restore from the dumpfile on the backup server:<br>
<tt>restore -r -f /backup<span style="color:red">4/col00241.20120329.noarchive.dump</span> .<br>
rm restoresymtable</tt>

when dump/restore completes successfully, use df to confirm the restored data size matches the original usage figure


edit quad (<tt>vq<span style="color:red">1</span></tt>) to point to new (/mnt/data<span style="color:red">3</span>) location AND new volume (<tt>/dev/concat/<span style="color:red">v82-v84v106v107</span>a</tt>), run buildsafe

 
restart the jail:<br>
<tt>startjail <hostname></tt>


TODO: clean up/clear old gvin/gconcat vol


6. update disk (and dir if applicable) in mgmt screen


7. update backup list AND move backups, if applicatble


Ex: <tt>mvbackups <span style="color:red">69.55.237.26-col00241</span> jail<span style="color:red">9</span> data<span style="color:red">3</span></tt>



DEPRECATED - steps to tack on a new gvin to existing gconcat- leads to corrupted fs
bsdlabel -e /dev/concat/v82-v84

To figure out new size of the c partition, multiply 4194304 by the # of 2G gvinum volumes and subtract the # of 2G volumes:
10G: 4194304 * 5 – 5 = 20971515
8G: 4194304 * 4 – 4 = 16777212
6G: 4194304 * 3 – 3 = 12582909
4G: 4194304 * 2 – 2 = 8388606

To figure out the new size of the a partition, subtract 16 from the c partition:
10G: 20971515 – 16 = 20971499
8G: 16777212 – 16 = 16777196
6G: 12582909 – 16 = 12582893
4G: 8388606 – 16  = 8388590

Orig:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:  8388590       16    4.2BSD     2048 16384 28552
  c:  8388606        0    unused        0     0         # "raw" part, don't edit

New:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a: 12582893       16    4.2BSD     2048 16384 28552
  c: 12582909        0    unused        0     0         # "raw" part, don't edit

sync; sync

growfs /dev/concat/v82-v84a

fsck –fy /dev/concat/v82-v84a

sync

fsck –fy /dev/concat/v82-v84a

(keep running fsck’s till NO errors)


== Adding disk to a 4.x jail ==

NOTE: if you unmount before vnconfig, re-vnconfig then unmount then vnunconfig
 
If someone wants more disk space, there’s a paste for it, send it to them.

1. Figure out the space avail from [[#js|js]]. Ideally, you want to dump to a different partition and create the new vn on the current partition. If no space to make make dump, then move old vn’s to backup or worse, dump to backup.


2. <tt>g <customerID></tt> to get the info


3. make vnfile, newfs and mount. USE CAUTION not to overwrite existing vn. <br>
Ex: <tt>vnconfig –T -S 9g -s labels -c /dev/vn22 /mnt/data2/65.214.160.117-col00496<br>
disklabel -r -w vn30 auto<br>
newfs /dev/vn30c<br>
mkdir /mnt/data2/65.214.160.117-col00496-DIR<br>
mount /dev/vn30c /mnt/data2/69.55.237.26-col00241-DIR<br>
cd /mnt/data2/65.214.160.117-col00496-DIR</tt>


4. <tt>jailkill <hostname></tt>


5. <tt>dump -0a -f - /dev/vn1 | restore -r -f -</tt> 


6. <tt>rm restoresymtable</tt>


7. unmount and unvnconfig old system:<br>
<tt>umount /dev/vnNNc <br>
vnconfig –u /dev/vnNN</tt>


8. move vnfile. Ex: <tt>mv /mnt/data1/69.55.237.26-col00241 /mnt/data1/old-col00241-vnfile-20110312-noarchive</tt>


9. remove old dir: <tt>rmdir /mnt/data1/69.55.237.26-col00241-DIR</tt>


10. edit quad/safe to point to new location, run <tt>[[#buildsafe|buildsafe]]</tt>


11. start jail: <tt>[[#startjail|startjail]] <hostname></tt>


12. update disk (and dir if applicable) in mgmt screen


13. mv backups if necessary
 
WARNING: if you restore to unmounted vn then you’re actually restoring to /mnt/data1
you can tar and do above to recover
 
NOTE: you can’t move a customer to a system that has a base sys that is diff from the one they came from (BSD 4.8 to 4.5 = won’t work)
 
NOTE: we don’t mount procfs's to anyone’s system by default.

NOTE: mount_nulls: 
also goofy mount_nulls seen in df -k on jail2<br>
also needs to be done in safe and quad<br>
if he wants another, use safe/quad as example and add it to file. also exec command at prompt to add mount (no reboot, remount or rejail necessary)

== Moving customer to another jail machine ==

Systems should only be moved between similarly-versioned jails. If you cannot get a match, you should go to freebsd.org and see what the differences were between the version you're moving from and the new version. Many of the updates will be to drivers and kernels - these don't affect the customer. Perhaps they updated traceroute or a library...you should copy the new/changed files into the VPS/jail on the new host. An alternate method for moving to a different version would be to buildworld to bring their full distribution up to date. Optional steps for this kind of upgrade are included below.

1. <tt>g <customerID></tt>


2. <tt>[[#jailkill|jailkill]] <hostname></tt>


3. create new device on target system<br>
4.x:<br>
<pre>vnconfig –T –S 4g –s labels –c /dev/vn1 /mnt/data1/69.55.22x.x-col00XXX
disklabel –r –w vn1 auto
newfs /dev/vn1c</pre>

6.x:<br>
<pre>bsdlabel -r -w /dev/gvinum/v1
newfs /dev/gvinum/v1a
 or 
gconcat label v1-v3 /dev/gvinum/v1 /dev/gvinum/v2 /dev/gvinum/v3
bsdlabel -r -w /dev/concat/v1-v3
newfs /dev/concat/v1-v3a</pre>

7.x+:<br>
Run jailmakeempty then skip step 4 & 7 below


4. make and mount the dir on new system. <br>
Ex: <pre>mkdir /mnt/data2/69.55.230.3-col00123-DIR
chmod 755 /mnt/data2/69.55.230.3-col00123-DIR
mount [device] /mnt/data2/69.55.230.3-col00123-DIR</pre>


5. [[#stopjail|stopjail]] <hostname> 1


6. dump fs to new system. ex: <tt>[[#dumpremoterestore|dumpremoterestore]] /dev/vn51 10.1.4.118 /mnt/data2/69.55.239.45-col00688-DIR</tt> (make sure you can ssh as root on remote machine) 


6a. OPTIONAL BUILDWORLD:<br>
 cd /usr/src
 make world DESTDIR=/mnt/data2/69.55.xxx.xx-col0xxxx-DIR

 cd etc
 make distribution DESTDIR=/mnt/data2/69.55.xxx.xx-col0xxxx-DIR

(you may have to rm an openssh file and re-make dist)

 rm -rf /mnt/data2/69.55.xxx.xx-col0xxxx-DIR/etc/periodic/daily/400.status-disks
 vi /etc/periodic/security/100.chksetuid
replace: <tt>MP=`mount -t ufs | grep -v " nosuid" | awk '{ print $3 }' | sort`</tt><br>
with: <tt>MP='/' (use single quotes)</tt>

Ask user if they want ports overwritten with current, if yes:
 cp -r /usr/ports /mnt/data2/69.55.xxx.xx-col0xxxx-DIR/usr

on source: 
 cd /mnt/data2/69.55.xxx.xx-col0xxxx-DIR/etc; vipw –d .
(copy in all info)

on target: 
 cd /mnt/data2/69.55.xxx.xx-col0xxxx-DIR/etc; vipw –d . 
(paste all info)

on source: 
 cat /mnt/data2/69.55.xxx.xx-col0xxxx-DIR/etc/group
(copy in all info)

on target:
 cat > /mnt/data2/69.55.xxx.xx-col0xxxx-DIR/etc/group 
(paste all info)


7. edit quad on source system copy over entries for the jail to the target system 
take care that the vn/gvinum/gconcat/md devices on the target system are’t in use and that the /mnt/dataN path doesn’t need changing


8. run [[#buildsafe|buildsafe]] (if this is <=6.x) on target system (copies changes made to quad into safe file)


9. remove ip from source system. Ex: <tt>ipdel 69.55.230.3</tt>


10. add ip to target system. Ex: <tt>ipadd 69.55.230.3</tt>


11. start new system, with [[#startjail|startjail]] or manually, pasting entries found from running <tt>g <customerID></tt> on the new system


12. run <tt>[[#canceljail|canceljail]] col0xxxx</tt> on source system. This should prompt you to remove backups if any existed. Do not let it do this, rather:

 
13. if backups existed move them to the new host via the <tt>[[#mvbackups_.28freebsd.29|mvbackups]] script</tt>


14. edit quad on source system, edit comment to reflect a move rather than a cancel, ex: <tt># moved to jail2 col00241</tt>


15. edit mgmt to reflect new host and dir for new system


16. optional add ipfw rules on new system. Ex:
<tt>ipfw add 01231 count ip from 69.55.230.2 to any
ipfw add 01232 count ip from any to 69.55.230.2</tt>

== Increasing inodes for a VPS ==

when doing the newfs: 
 newfs -i  4096 ...


== Jail services unable to contact services within same jail ==

This is due to a messed up routing table (only seen it on jail2 before). 

 route delete 69.55.228.65/32 route add -net 69.55.228.65/32 -iface fxp0 -nostatic -cloning route add 69.55.228.65 -iface lo0


== /dev/null permission resets == 

Applies to 4.x jails only

For some reason that I do not understand at all, the /dev/null node in customer jails, after the system is restarted, often reverts permissions to 0600 ... which is bad, because non-root processes cannot redirect to /dev/null anymore.  In fact, a fair number of server daemons will complain and/or fail if this is the case.

So, after a restart, after all systems have restarted, run postboot, which does something like:
 
 for f in `df -k | grep /dev/vn | awk '{print $6}'` ; do chmod 0666 $f/dev/null ; done

Later, after the system is up, if anyone ever complains about their /dev/null permissions for any reason ... really if anyone mentions
/dev/null in any way, tell them to run:

 chmod 0666 /dev/null

We have ultimately fixed this by adding chmods to the quad/safe's


== mknod in a jail == 

For a customer who wants to run a chroot name server (named) in a jail.

<pre>
cd /mnt/data1/<user directory>/var/named
mknod -c 1 1 null
mknod -c 2 3 random
</pre>

== Postfix Problems ==

This hasn't happened in a long time, probably no longer valid with modern OS's

Postfix is an alternate MTA - people replace sendmail with it in much the same way that many people also replace sendmail with qmail.  The problem is that if you install postfix inside of a jail, by default it will not work properly - the user will see error messages like this:

incoming mail generates errors like this:

 mail_queue_enter: create file incoming/298266.55902: File too large

and further, sending mail generates errors like this:

 postdrop: warning: mail_queue_enter: create file maildrop/627930.56676: File too large

This is very easy to solve.  In fact, the FreeBSD welcome email that is sent out by `jailmake` now contains this block of text:

- if you plan on installing postfix, email us first and ask for a necessary patch.

So, anytime you see a support email that says anything about postfix not working, paste that line from the welcome email into your response to them, and right away, forward the postfix patch to them in an email.

Here are the instructions:

Simply copy the patch into /usr/ports/mail/postfix/files, rename the file to ‘patch-file_limit.c’ and then cd to /usr/ports/mail/postfix and run `make install`.

The postfix patch is the very first email in the support email box, and we never delete or save it because we always want it there to forward to people.  The instructions in the email are very clear, and it solves their problem 100% of the time.


== Problems with `find` ==

There was a bug in the FreeBSD vn-filesystem code.  The bug causes the system to crash or hang when someone runs the `find` command inside their vn-backed filesystem (and even though the bug may have been fixed, due to the i/o overhead, we still don't like find).

Now, this does not always happen - however, any jail machine with more than 20 systems on it will crash every night if every system on it runs the daily periodic script out of their crontab, and updates their locate database with the find command.

(the locate database is a small, FreeBSD specific database that is populated nightly from a cron job, and is then used to provide fast answers to the `locate` command)

In normal systems (32-40 jails on them), all that has to be done is make sure nobody runs the daily periodic.  This means that the file /etc/crontab, instead of looking like this:

<pre># do daily/weekly/monthly maintenance
1      3       *       *       *       root    periodic daily
15     4       *       *       6       root    periodic weekly
30     5       1       *       *       root    periodic monthly</pre>

Needs to look like this:

<pre># DO NOT UNCOMMENT THESE - contact support@johncompanies.com for details
#1      3       *       *       *       root    periodic daily
#15     4       *       *       6       root    periodic weekly
#30     5       1       *       *       root    periodic monthly
## DO NOT UNCOMMENT THESE ^^^^^^^^</pre>

and the problem will generally not occur at all.  New filesystem images are always altered in this way, so all systems are set like this.

However, if you have enough systems running, even casual uses of find, outside of the periodic scripts, can cause the system to crash.  We are seeing this on jail13, which crashes about once per week.

There is not yet any resolution to this problem.

So, if you have a freebsd system crash, see if anyone has uncommented their daily periodic line - you can see them for everyone by running:

 /bin/sh

and then running:

<pre>for f in `df -k | grep vn | awk '{print $6}'` ; do echo `cat $f/etc/crontab | grep "periodic daily"` $f ; done | more</pre>

(all one line, one command)

And then edit those /etc/crontabs and comment them out again.


== Problems un-mounting - and with mount_null’s ==

If you cannot unmount a filesystem, beacuse it says the filesystem is busy, it is because of three things:

a) the jail is still running

b) you are actually in that directory, even though the jail is stopped

c) there are still dev, null_mount or linprocfs mount points mounted inside that directory.

d) when trying to umount null_mounts that are really long and you get an error like “No such file or directory”, it’s an OS bug where the dir is truncated. No known fix

e) there are still files open somewhere inside the dir. Use <tt>fstat | grep <cid></tt> to find the process that has files open

f) Starting with 6.x, the jail mechanism does a poor job of keeping track of processes running in a jail and if it thinks there are still procs running, it will refuse to umount the disk. If this is happening you should see a low number in the #REF column when you run jls. In this case you ''can'' safely <tt>umount –f</tt> the mount. 

Please note -if you forcibly unmount a (4.x) filesystem that has null_mounts
still mounted in it, the system '''will crash''' within 10-15 mins.

== Misc jail Items ==

We are overselling hard drive space on jail2, jail8, jail9, a couple jails on jail17, jail4, jail12 and jail18.
Even though the vn file shows 4G size, it doesn’t actually occupy that amount of space on the disk. So be careful not to fill up drives where we’re overselling – use oversellcheck to confirm you’re not oversold by more than 10G.
There are other truncated jails, they are generally noted in a the file on the root system: /root/truncated

The act of moving a truncated vn to another system un-does the truncating- the truncated vn is filled with 0’s and it occupies physical disk space for which it’s configured. So, you should use dumpremote to preserve the truncation.

* if you are getting disk full messages for a BSD customer, it's fairly safe to clear out their /usr/ports/distfiles dir
* 4.x: ps and top an only be run by root in these jails.  Done on purpose:
As for `ps` and `top` - non-root users can run them, just not successfully because we have locked the permissions on /dev/mem and /dev/kmem to be root-readable only.  That is why non-root users cannot successfully run `ps` and `top`.
* user quotas do not work on freebsd jails - you cannot set up quotas at all, and that's that.
* You cannot inject a process into a 4.x jail, only HUP running processes. 6.x and onward you can with jexec
* jails see base machine’s uptime/load when running top/w
* if someone is unable to get in- cant ping, etc- see if they were blocked by castle (DoS), see if their ip is on the system (post reboot it was lost cause wasn’t in the rc.conf). preboot should catch that.
* in FreeBSD you cant su to root unless you belong to wheel group – so if you remove your acct and setup a new one, we have to add it to wheel group (add to /etc/group)
* Dmesg from underlying sys is seen in customers dmesg on jail
* Popper process going crazy @ 40% for 10min = someone who leaves mail on server
* Don’t force umounts on 4.x jails – it crashes the machine, generally ok on newer machines (running md)
* Good book for admin http://search.barnesandnoble.com/booksearch/isbnInquiry.asp?userid=t824VyRAYz&isbn=0596005164&itm=2
* Self-sign ssl cert http://httpd.apache.org/docs/2.0/ssl/ssl_faq.html#selfcert
* conversation with Glenn about semaphores and pgsql
<pre>SDBoody: hey, these are valid amounts/figures/increments right:
 kern.ipc.semmni=1280
 kern.ipc.semmns=1280
 
gr8feen: probably... I always forget exactly what those are, so I usually have to look them up
SDBoody: semaphores
SDBoody: i took the current 1024 and added 256 to them
SDBoody: need more for pgsql
gr8feen: I meant the mni and mns parts...
gr8feen: some of those are not ones you just want to add to...   hang on a sec and I'll look them up..
gr8feen: what's semmsl set to?
SDBoody: kern.ipc.semmsl: 1024
 
SDBoody: kern.ipc.msgseg: 2048 
 kern.ipc.msgssz: 8 
 kern.ipc.msgtql: 40 
 kern.ipc.msgmnb: 2048 
 kern.ipc.msgmni: 40 
 kern.ipc.msgmax: 16384 
 kern.ipc.semaem: 16384 
 kern.ipc.semvmx: 65534 
 kern.ipc.semusz: 152 
 kern.ipc.semume: 10 
 kern.ipc.semopm: 100 
 kern.ipc.semmsl: 1024 
 kern.ipc.semmnu: 512 
 kern.ipc.semmns: 1024 
 kern.ipc.semmni: 1024 
 kern.ipc.semmap: 768 
 kern.ipc.shm_allow_removed: 0 
 kern.ipc.shm_use_phys: 1 
 kern.ipc.shmall: 262144 
 kern.ipc.shmseg: 256 
 kern.ipc.shmmni: 784 
 kern.ipc.shmmin: 1 
 kern.ipc.shmmax: 536870912 
 kern.ipc.maxsockets: 25600 
 
gr8feen: ok...msl is max per id, mni is max ids, mns is max number of semaphores... so you probably want something like mns = mni * msl
gr8feen: which one did you run out of?
SDBoody: not sure how to tell- ipcs shows the sems in use add up to 1024
SDBoody: there are 59 entries
gr8feen: I'm assuming you tried to start postgres and it failed?
SDBoody: yes
gr8feen: it should have logged why, somewhere..
gr8feen: if I recall, it'll tell you which one it ran out of
SDBoody: > DETAIL:  Failed system call was semget(1, 17, 03600).
 
gr8feen: so it wanted an id with 17 semaphores...I'd start by making mns = 17*mni and leave mni and mnl set to what they are now and see what it does
SDBoody: i think mni is plenty high
SDBoody: ok, more reasonable 17408
gr8feen: yeah...just change that one and see how it goes..
SDBoody: and leave mni alone at 1024?
gr8feen: yeah...mni id the max number of ids...but if it's trying to get something like 17 per id, your going to hit mns before you hit anything else
SDBoody: right, but doesn't hurt to have it that high (assuming)

gr8feen: not really...  I think those get allocated out of ram that you cant page out, but it's still such a small amount that it really doesn't matter
SDBoody: looks like that worked, thx!
gr8feen: cool

see semaphores: ipcs -a -s</pre>