ZFS: i/o error – all block copies unavailable on large disk number machines
We did a minor kernel update on a large storage machine here today which runs FreeBSD 8.2 and to our surprise it failed to boot at the loader with “ZFS: i/o error – all block copies unavailable”.
After some digging we discovered that this was likely due to the fact that the BIOS only enumerates the first 12 disks and this machine has more than that in the root zpool which was a striped raidz2 volume. This in turn means that the bootcode cant complete and hence the machine can’t boot.
Our solution was to migrate the root fs off the raidz2 volume and to a mirrored volume which was on two disks which where accessible from the BIOS.
To do this we created a new zfs pool manually, copied the data using zfs send … | zfs restore .. then fixed the cache file by importing the pools with the following commands from a mfsbsd cdrom.
zpool import -R /mnt -o cachefile=/boot/zfs/zpool.cache tank zpool import -R /mnt2 -o cachefile=/boot/zfs/zpool.cache tank2 cp /boot/zfs/zpool.cache /mnt/boot/zfs/zpool.cache zpool set bootfs=tank/root tank
It would of course be nice if zfs warned or even prevented this.
Updated contextual help in WordPress
If you’re like me and you try to be a good citizen when it comes to writing wordpress plugins, then you’ll write some contextual help/documentation for them that shows up in the appropriate places of the wordpress admin interface.
Since the wordpress 3.3.1 update however, the previous method I used of doing this resulted in the help permanently appearing on the pages in question. When you have a lot of plugins and a lot of help text, that can result in the actual admin interface being several full page scrolls away, which can be pretty annoying.
As such, I recently worked out how to quickly adapt the old simple style of adding contextual help to make use of the newer system.
Before:
// add some contextual help in for add/edit post admin pages add_action('load-post-new.php', 'myplugin_help'); add_action('load-post.php', 'myplugin_help'); function myplugin_help() { add_filter('contextual_help','load_myplugin_help'); } function load_myplugin_help($help) { echo $help; echo "My custom plugin help"; }
After:
// add some contextual help in for add/edit post admin pages add_action('load-post-new.php', 'myplugin_help'); add_action('load-post.php', 'myplugin_help'); function myplugin_help() { add_filter('contextual_help','load_myplugin_help'); } function load_myplugin_help($help) { get_current_screen()->add_help_tab( array( 'id' => 'myplugin-help', 'title' => __('My Plugin Help'), 'content' => "Help for my plugin" ) ); }
The benefit of this new way is that the new system nicely sorts all the help into little menus, so rather than having all of your help on one massive page, the help dropdown at the top of the admin interface provides a menu for each plugin, allowing the help section to take up less space and generally be more usable.
How to configure LACP trunk between Cisco and ExtremeNetworks switches
Once you know its really simple the key is ensuring that the extreme end is using “dyanmic” sharing which enables LACP.
The following example configures a LACP trunk of 2 ports between an Cisco 6500 port 8/10, 8/11 and Extreme 400-48t ports 1, 2
Cisco Config
interface Port-channel1 switchport switchport trunk encapsulation dot1q switchport mode trunk interface GigabitEthernet8/11 switchport switchport trunk encapsulation dot1q switchport mode trunk channel-group 1 mode active interface GigabitEthernet8/11 switchport switchport trunk encapsulation dot1q switchport mode trunk channel-group 1 mode active
Extreme Config
enable sharing 1 grouping 1,2 dynamic
You can then use “sh ports 1 sharing” to check the Extreme end and “sh etherchannel 1 summary” to check the cisco end
gdb logging useful for large backtraces
If your trying to output a large backtrace like those generated via kernel panics the following can be quite useful:-
set logging redirect on
(gdb) set height 0 (gdb) set logging file backtrace.txt (gdb) set logging redirect on (gdb) set logging on Redirecting output to backtrace.txt. (gdb) thread apply all bt
FreeBSD security support for ATA devices via camcontrol
Recently we’ve been using a lot of SSD’s and one of the problems with SSD’s is they degrade in performance over time. So much so that in some cases that they can barely keep up with basic tasks.
In our experience we’ve seen Sandforce based drives drop from write rate of 180MB/s to just over 10MB/s making them all but unusable.
Given this issue and the current lack of TRIM support under ZFS, our filing system of choice, we’ve need to use secure erase on our SSD’s to return them to their purchased performance.
Unfortunately this meant booting the machine into Linux and using the hdparm command along with the instructions mentioned in the ATA Secure Erase wiki article. This obviously not ideal so I’ve spent the past two days adding this ability to FreeBSD’s camcontrol utility for ata devices.
Our current patch for camcontrol, which adds security functions including the secure erase option, can be downloaded here: FreeBSD 8.2 ATA security methods patch for camcontrol
Once you have patched and compiled camcontrol there will a new “security” option. This allows you display and configure security on ATA drives when they are connected to an ATA controller such as ahci. which present the disk as adaX devices.
To secure erase a disk, the disk first needs to have security enabled, which means setting a ‘user’ password. Using the updated camcontrol this can be done in one single command line.
First find the device name of your SSD with:-
camcontrol devlist
***WARNING*** running the command below will ERASE ALL data on the device ada0 so ensure you have copied off or your data backed up prior to running it.
camcontrol security ada0 --security-user user \ --security-set-password Erase \ --security-erase Erase
This will first set the user security password to “Erase”, which enables drive security, followed by prompting your to confirm you want to erase the selected disk.
If you are 100% sure this is what you want you can also specify the –security-confirm command line option to avoid this confirmation prompt.
It should be noted that there is currently problems with long timeouts, which are used when performing a secure erase, within a large number of FreeBSD 8.2 drivers. For SSD’s which don’t actually require a long time to secure erase, but often report needing so, you can use the --security-erase-timeout option to override this value on kernels which don’t have working long timeouts, described in my last post.
I hope to get this patch committed to the FreeBSD source at some point, but until then I hope this is of help to other FreeBSD users using SSD’s.
Much credit to Daniel Roethlisberger for his work on adding security support to atacontrol, detailed in PR bin/127918 which was the basis of this code.
timeout overflow in CAM / Drivers under FreeBSD 8.2
In the process of updating camcontrol to support security features, including the ability to secure erase an SSD to restore performance, I came across and issue where by timeouts passed in via cam layer overflow above 2147 seconds, resulting in instant timeouts.
This is caused by a integer overflow at the driver level when converting the msec timeout value to ticks before passing in to timeout, callout_reset and friends. After discussion on the freebsd-hackers list a fix created by Eygene Ryabinkin and updated by myself to support all drivers has been created.
Download the cam ccb timeout issue patch updated by Eygene Ryabinkin + mps driver support (Updated 22/10/2011)
For more information see the FreeBSD Hackers mailing list archive thread: cam / ata timeout limited to 2147 due to overflow bug?
php Segmentation fault (core dumped) in xmlFreeMutex
Having just rebuilt a clean install of php 5.3.6 it was crashing left right and center even a php -m caused it.
The stack looks like the following under gdb
#0 0x0000000103db8600 in ?? ()
#1 0x0000000100ce1a95 in xmlFreeMutex () from /usr/local/lib/libxml2.so.5
#2 0x0000000100ce14d5 in xmlCleanupGlobals () from /usr/local/lib/libxml2.so.5
#3 0x0000000100c79f4a in xmlCleanupParser () from /usr/local/lib/libxml2.so.5
#4 0x000000000044daa8 in php_libxml_shutdown ()
#5 0x000000000044dad9 in zm_shutdown_libxml ()
#6 0x000000000055289f in module_destructor ()
#7 0x0000000000559be4 in zend_hash_apply_deleter ()
#8 0x0000000000559e58 in zend_hash_graceful_reverse_destroy ()
#9 0x000000000054dfc8 in zend_shutdown ()
#10 0x00000000004fbb9a in php_module_shutdown ()
#11 0x00000000005d55e2 in main ()
#12 0x0000000000417125 in _start ()
The problem is being caused by libxml2 under FreeBSD being compiled as threaded by default where as php isn’t.
The fix is simple select the LINKTHR option from make config and recompile php and its modules to be safe
IPMI under FreeBSD is easy!
Its easy, first you need the ipmi kernel module loaded e.g. kldload ipmi. Add ipmi kernel module and its dependency i2c (for smbus) to your kernel and recompile if its not already available.
Next build / install ipmitool package (/usr/ports/sysutils/ipmitool)
That’s all, see man ipmitool for the myriad of options
Finding the cause of kldload “Exec format error”
I’ve just been trying to get ipmitool working on our custom kernel, which requires the ipmi kernel module loaded however after adding this module to our kernel config it refused to load giving the following error:-
kldload ipmi.ko
kldload: can't load ipmi.ko: Exec format error
Not exactly helpful
After much digging I finally figured out it the ipmi module depends on the smbus module, so I added this and in doing so noticed some messages on the console of the machine, I’d been working over ssh. These messages told me exactly the issue:-
Jun 6 23:27:47 test kernel: KLD ipmi.ko: depends on smbus - not available or version mismatch
Jun 6 23:27:47 test kernel: linker_load_file: Unsupported file type
If only it output those to controlling console as well as the main console and /var/log/messages I would have not spent ages searching for the solution.
The moral of this little tale is if you have kldload spit out “Exec format error” at you, check /var/log/messages to find out why
“version” perl Module Upgrade on SFU results in broken perl!
For those of you who use SFU, a useful POSIX layer thats freely available for Windows that want to update / install new perl modules using CPAN, be aware that the updated the “version” module can lead to a broken perl install. The resulting symptoms are that perl commands hang doing nothing. The cause of this is that you end up with two “version” module folders, one lower case “version” and one upper-case “Version”.
The fix is simply to remove the old “Version” folder from /usr/local/lib/perl5/site_perl/