Bad hard disk I/O performance on systems with adaptec SCSI controllers

Support knowledgebase (jsj_adaptec_performance_70)
Applies to

SuSE Linux: Versions up to (including) 7.0
This article refers to an older version of SuSE Linux. Therefore some of the informations given in this article may be outdated or the article may contain stale links.

Symptom:

You may experience very bad hard disk I/O performance on systems with an adaptec SCSI controller, eg. certain IBM Netfinity and xSeries systems.

Cause:

It seems like that a missing compile time option in the aic7xxx SCSI driver of the SuSE kernels is the culprit - a feature called "Tagged Comand Queueing" (TCQ) has not been enabled by default. We had decided to not enable this feature in former kernel versions because of the warning comments in the configuration help texts

(/usr/src/linux/Documentation/Configure.help):
CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT
This option causes the aic7xxx driver to attempt to use tagged command
queueing on any devices that claim to support it.  If this is set to yes,
you can still turn off TCQ on troublesome devices with the use of the
tag_info boot parameter.  See /usr/src/linux/drivers/scsi/README.aic7xxx
for more information on that and other aic7xxx setup commands.  If this
option is turned off, you may still enable TCQ on known good devices by
use of the tag_info boot parameter.

If you are unsure about your devices then it is safest to say N here.

However, TCQ can increase performance on some hard drives by as much
as 50% or more, so I would recommend that if you say N here, that you
at least read the README.aic7xxx file so you will know how to enable
this option manually should your drives prove to be safe in regards
to TCQ.

Conversely, certain drives are known to lock up or cause bus resets when
TCQ is enabled on them.  If you have a Western Digital Enterprise SCSI
drive for instance, then don't even bother to enable TCQ on it as the
drive will become unreliable, and it will actually reduce performance.

It seems like having this option not turned on by default severely impacts the hard disk performance on IBM hard disk drives - we did not experience such bad values on other systems so far. For this reason and since it still can be disabled with a boot parameter, we have now enabled this feature by default for all future SuSE kernel packages (from 7.1 on).

Solution:

You can verify, if this option is enabled or disabled in the current running kernel (at least in newer version of SuSE Linux that include the "cloneconfig" kernel patch) by issuing the following command:
 zgrep TCQ /proc/config.gz
On older versions, please check the kernel configuration file.

On a SuSE 7.0 System with the original kernel you will get the following output:
# CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT is not set
SuSE Linux 7.1 and other systems with this feature enabled will show:
CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y
Fortunately, it is not necessary to recompile the kernel to enable this feature. You can pass a boot parameter to enable TCQ on specific drives.
>From the aic7xxx documentation /usr/src/linux/drivers/scsi/README.aic7xxx:
         "aic7xxx=tag_info:{{8,8..},{8,8..},..}"

 This option is used to disable or enable Tagged Command Queueing
 (TCQ) on specific devices.  As of driver version 5.1.11, TCQ is now
 either on or off by default according to the setting you choose
 during the make config process.  In order to en/disable TCQ for
 certian devices at boot time, a user may use this boot param.  The
 driver will then parse this message out and en/disable the specific
 device entries that are present based upon the value given. The
 param line is parsed in the following manner:


   { - first instance indicates the start of this parameter values
           second instance is the start of entries for a particular
           device entry
   } - end the entries for a particular host adapter, or end the
           entire set of parameter entries
   , - move to next entry.  Inside of a set of device entries, this
           moves us to the next device on the list.  Outside of device
           entries, this moves us to the next host adapter
   . - Same effect as , but is safe to use with insmod.
   x - the number to enter into the array at this position.
           0 = Enable tagged queueing on this device and use the
           default queue depth
           1-254 = Enable tagged queueing on this device and use this
                   number as the queue depth
           255 = Disable tagged queueing on this device.
           Note: anything above 32 for an actual queue depth is
           wasteful and not recommended.

 A few examples of how this can be used:

 tag_info:{{8,12,,0,,255,4}}

   This line will only effect the first aic7xxx card registered.  It
   will set scsi id 0 to a queue depth of 8, id 1 to 12, leave id 2
   at the default, set id 3 to tagged queueing enabled and use the
   default queue depth, id 4 default, id 5 disabled, and id 6 to 4.
   Any not specified entries stay at the default value, repeated
   commas with no value specified will simply increment to the next
   id without changing anything for the missing values.


 tag_info:{,,,{,,,255}}

   First, second, and third adapters at default values.  Fourth
   adapter, id 3 is disabled.  Notice that leading commas simply
   increment what the first number effects, and there are no need for
   trailing commas.  When you close out an adapter, or the entire
   entry, anything not explicitly set stays at the default value.

 A final note on this option.  The scanner I used for this isn't
 perfect or highly robust.  If you mess the line up, the worst that
 should happen is that the line will get ignored.  If you don't close
 out the entire entry with the final bracket, then any other aic7xxx
 options after this will get ignored.  So, in general, be sure of
 what you are entering, and after you have it right, just add it to
 the lilo.conf file so there won't be any mistakes.  As a means of
 checking this parser, the entire tag_info array for each card is now
 printed out in the /proc/scsi/aic7xxx/x file.  You can use that to
 verify that your options were parsed correctly.

Enabling TCQ via the boot parameter dramatically increased the performance on the tested systems.
If your kernel has the aic7xxx driver compiled in, you can pass this parameter on bootup or by adding the line to the "append" line in /etc/lilo.conf (rerun LILO afterwards to apply the change).
If you load the driver module via an initial RAM-disk (initrd), add the correct parameter to /etc/modules.conf (options aic7xxx ...) and recreate the initial ramdisk by afterwards my executing "mk_initrd".

Please take care, that your bootmanager will be updated to use the new initrd! If you use LILO, reinstall lilo from the command line with the command:
lilo
Hint: Our customer Herr Griem told us, that you may run into difficulties when using an initial ramdisk caused by the delimeter comma (,). In /usr/src/linux/drivers/scsi/README.aic7xxx is stated:
   Module Loading command options
   ------------------------------
    When loading the aic7xxx driver as a module, the exact same options are
    available to the user.  However, the syntax to specify the options changes
    slightly.  For insmod, you need to wrap the aic7xxx= argument in quotes
    and replace all ',' with '.'.  So, for example, a valid insmod line
    would be:
 
    insmod aic7xxx aic7xxx='verbose.irq_trigger:1.extended'
Please replace all commas with periods.
Keywords: TAGGED, COMMAND, QUEUEING, IBM, NETFINITY, AIC7XXX, TCQ, ADAPTEC, DISK

Categories: SCSI

SDB-jsj_adaptec_performance_70, Copyright SuSE Linux AG, Nürnberg, Germany - Version: 25. Jan 2001
SuSE Linux AG - Last generated: 08. Feb 2001 by jsj (sdb_gen 1.40.0)