<html><head><meta name="color-scheme" content="light dark"></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">Oracle bug 9764220
July 22, 2010
Chuck Anderson
chuck.anderson@oracle.com

Fix for Oracle bug 9764220.
A 64-bit EL5u5 PV guest running on an OVM 2.2/2.2.1 system with a Qlogic qle2462
PCI-E FC HBA attached will not shut down cleanly.  The following messages
appear on a serial console:

	Unmounting file systems:  
	Halting system...
	md: stopping all md devices.
	Synchronizing SCSI cache for disk sda:
	pcifront pci-0: pciback not responding!!!
	xenbus_dev_shutdown: device/vif/0 timeout closing device
	pcifront pci-0: pciback not responding!!!
	pcifront pci-0: pciback not responding!!!
	[repeats]

I put a BUG() in drivers/xen/pcifront/pci_op.c do_pci_op() where the
"pcifront pci-0: pciback not responding!!!" is issued so that the system
would crash and produce a call stack:

RIP: e030:[&lt;ffffffff803bcd14&gt;]  [&lt;ffffffff803bcd14&gt;] do_pci_op+0x215/0x264
Call Trace:
  &lt;IRQ&gt;  [&lt;ffffffff80288839&gt;] __activate_task+0x56/0x6d
  [&lt;ffffffff803bd48f&gt;] pcifront_bus_read+0xe3/0x17d
  [&lt;ffffffff8034a15d&gt;] pci_bus_read_config_word+0x57/0x83
  [&lt;ffffffff8817309a&gt;] :qla2xxx:qla2x00_timer+0x0/0x29b
  [&lt;ffffffff881730d9&gt;] :qla2xxx:qla2x00_timer+0x3f/0x29b
  [&lt;ffffffff8029320e&gt;] run_timer_softirq+0x197/0x249
  [&lt;ffffffff80212cd3&gt;] __do_softirq+0x8d/0x13b
  [&lt;ffffffff80260da4&gt;] call_softirq+0x1c/0x278
  [&lt;ffffffff8026e0c1&gt;] do_softirq+0x31/0x98
  [&lt;ffffffff8026df4d&gt;] do_IRQ+0xec/0xf5
  [&lt;ffffffff803b3fe6&gt;] evtchn_do_upcall+0x13b/0x1fb
  [&lt;ffffffff802608d6&gt;] do_hypervisor_callback+0x1e/0x2c
  &lt;EOI&gt;  [&lt;ffffffff802063aa&gt;] hypercall_page+0x3aa/0x1000
  [&lt;ffffffff802063aa&gt;] hypercall_page+0x3aa/0x1000
  [&lt;ffffffff8026f4eb&gt;] raw_safe_halt+0x84/0xa8
  [&lt;ffffffff8026ca80&gt;] xen_idle+0x38/0x4a
  [&lt;ffffffff8024ad7b&gt;] cpu_idle+0x97/0xba
  [&lt;ffffffff8064eb0f&gt;] start_kernel+0x21f/0x224
  [&lt;ffffffff8064e1e5&gt;] _sinittext+0x1e5/0x1eb

qla2x00_timer() has the following code:

void
qla2x00_timer(scsi_qla_host_t *ha)
{
	scsi_qla_host_t *pha = to_qla_parent(ha);
	unsigned long   cpu_flags = 0;
	fc_port_t       *fcport;
	int             start_dpc = 0;
	int             index;
	srb_t           *sp;
	int             t;
	uint16_t        w;

	/* Hardware read to raise pending EEH errors during mailbox * waits. */
	if (!pci_channel_offline(pha-&gt;pdev))
		pci_read_config_word(ha-&gt;pdev, PCI_VENDOR_ID, &amp;w);

According to Qlogic, the pci_read_config_word() call is made to trigger pending
EEH errors so that they may be handled more quickly on PPC64 and serve no real
purpose on X86.  The patch below wraps that call in an "#ifdef CONFIG_PPC64"
so they are not made on X86.

diff -uNrp linux-2.6.18.x86_64.orig/drivers/scsi/qla2xxx/qla_os.c linux-2.6.18.x86_64/drivers/scsi/qla2xxx/qla_os.c
--- linux-2.6.18.x86_64.orig/drivers/scsi/qla2xxx/qla_os.c	2010-11-18 14:14:44.000000000 -0800
+++ linux-2.6.18.x86_64/drivers/scsi/qla2xxx/qla_os.c	2010-11-18 23:07:24.000000000 -0800
@@ -3480,9 +3480,11 @@ qla2x00_timer(scsi_qla_host_t *ha)
 		return;
 	}
 
+#ifdef CONFIG_PPC64
 	/* Hardware read to raise pending EEH errors during mailbox waits. */
 	if (!pci_channel_offline(pha-&gt;pdev))
 		pci_read_config_word(ha-&gt;pdev, PCI_VENDOR_ID, &amp;w);
+#endif /* CONFIG_PPC64 */
 
 	if (!ha-&gt;parent &amp;&amp; IS_QLA82XX(ha))
 		qla82xx_watchdog(ha);
</pre></body></html>