- (tcq, general) Remove the 'attempt to keep queue full option'. It worked
on some IBM models, but failed miserably on others. Also removes some
uglies in ide_queue_commands()
- (tcq0 Change default depth back to 32.
- (general) Add isr for no-dataphase taskfile, like task_no_data_intr but
doesn't complain about failure. This is handy for commands what we _know_
will fail, such as WIN_NOP.
- (general) ide_cmd_type_parser() must set a handler to WIN_NOP... Otherwise
we will just hang the ide system issuing a nop.
- (general) Have ide_raw_taskfile() copy back the taskfile after execution,
otherwise we cannot use the info that ide_end_drive_cmd() puts in
there.
- (tcq) Use nIEN bit correctly in ide-tcq
- (tcq) Small ide_tcq_wait_altstat() changes. Do initial 400ns delay (1us
here), then 10us each successive run.
- (tcq) Add beginning for 'nop auto poll' support check.
- (tcq) Arm handler before GET_STAT() service check in
ide_dma_queued_start, WD seemed to trigger interrupt before that.
Makes WD Expert drives work with tcq.
Erich Focht [Thu, 18 Apr 2002 12:21:36 +0000 (05:21 -0700)]
[PATCH] more migration thread cleanups
I'm currently working on a node affine scheduler extension for NUMA
machines and the load balancer behaves a bit different from the original.
So after a few boot failures with those slowly booting 16 CPU IA64
machines I thought there must be a simpler solution than synchronizing and
waiting for the load balancer: just let migration_CPU0 do what it is
designed for. So my proposal is:
- start all migration threads on CPU#0
- initialize migration_CPU0 (trivial, reliable, as it already is on
the right CPU)
- let all other migration threads use set_cpus_allowed() to get to the
right place
The only synchronization needed is the non-zero migration threads waiting
for migration_CPU0 to start working, which it will, as it is already on
the right CPU. This saves quite some lines of code.
I first posted this to LKML on March 6th (BTW, the fix #1, too) and since
then it was tested on several big NUMA platforms: 16 CPU NEC AzusA (IA64)
(also known as HP rx....), up to 32 CPU SGI IA64, 16 CPU IBM NUMA-Q
(IA32). No more lock-ups at boot since then. So I consider it working.
There is another good reason for this approach: the integration of the CPU
hotplug patch with the new scheduler becomes easier. One just needs to
create the new migration thread, it will move itself to the right CPU
without any additional magic (which you otherwise need because of the
synchronizations which won't be there at hotplug). Kimi Suganuma in the
neighboring cube is fiddling this out currently.
Robert Love [Thu, 18 Apr 2002 09:01:31 +0000 (02:01 -0700)]
[PATCH] migration thread fix
Attached is a patch that disables interrupts while holding the rq_lock.
This is certainly needed to prevent a race against the timer tick, as
Erich Focht pointed out.
François Romieu [Thu, 18 Apr 2002 07:57:44 +0000 (00:57 -0700)]
[PATCH] 2.4.8 - dscc4 update 7/13
- dscc4_do_action() now looks like the others event waiting loops (may be
called from interrupt context however);
- dscc4_start_xmit(): cosmetic before LxDA changes + mb() parano;
- dscc4_clock_setting(): only one return point, thanks;
- dscc4_priv() invocation removed from dscc4_xxx_settings;
- minor cleanups.
François Romieu [Thu, 18 Apr 2002 07:57:30 +0000 (00:57 -0700)]
[PATCH] 2.4.8 - dscc4 update 5/13
- DEBUG_PARANOIA was bad. "if (debug > x) {" is nice;
- state_check() now has only one return point;
- try_get_rx_skb() cosmetic;
- dscc4_rx_update() belongs to HOLD mode to LxDA changes;
- dscc4_wait_ack_cec() behaves like dscc4_xpr_ack();
- dscc4_rx_skb() refill logic is ready for LxDA mode and does everything
to fulfill what its name suggests.
- document some errata voodoo in dscc4_init_one();
- dscc4_init_ring() should handle try_get_rx_skb() failure.
François Romieu [Thu, 18 Apr 2002 07:57:21 +0000 (00:57 -0700)]
[PATCH] 2.4.8 - dscc4 update 4/13
- dscc4_xpr_ack() busy waiting loop is modified so as to allow long
delay without chewing too much cycles;
- more errata sheet magic;
- dscc4_set_clock() now has only one return point.
François Romieu [Thu, 18 Apr 2002 07:57:04 +0000 (00:57 -0700)]
[PATCH] 2.4.8 - dscc4 update 2/13
- dscc4_patch_register() turns into scc_patchl() and should now avoid an
hardware bug quoted in errata sheet;
- dscc4_init_registers() interface changes as any caller doesn't really need
to poke into dscc4_dev_priv internal;
- scc_{writel/readl}() are added to access some buggy behaving registers;
- {read/write}l conversion to the previous functions
- dscc4_do_tx() sneaks, sorry. Belongs to HOLD -> LxDA changes.
Martin J. Bligh [Thu, 18 Apr 2002 07:54:39 +0000 (00:54 -0700)]
[PATCH] stop NULL pointer dereference in __alloc_pages
This trivial patch will apply to both 2.4.19-pre7 and 2.5.8 with just line
offsets. It stops us from following a NULL pointer in classzone in the case
where there is a pgdat without a fully populated zone list (ie a node with
no ZONE_NORMAL on an ia32 NUMA machine). Without this patch, ia32
NUMA machines won't even boot - we dereference the classzone ptr
a few lines further down (or try to ;-) ).
Andrew Morton [Thu, 18 Apr 2002 06:28:44 +0000 (23:28 -0700)]
[PATCH] pagecache locking bugfix
The bug which Anton found. On the
find_or_create_page->__find_lock_page path we're performing
a read_unlock of an rwlock which is held for writing.
The patch converts that to using a write_lock throughout.
Which penalises find_lock_page() a bit. If it shows up
on profiles then we can clone __find_lock_page() and
use read_lock()s, but for now I'd opt for saving the
cache footprint.
Martin Dalecki [Thu, 18 Apr 2002 02:53:10 +0000 (19:53 -0700)]
[PATCH] 2.5.8 IDE 38
- Fix typo in ide_cmd_ioctl().
- Fix typo in cris driver.
- Don't retry operations on medium errors. (pointed out by Eric Andersen).
- Attach the no_io_32bit, io_32bit, no_unmask, unmask and slow fields to the
ata_channel instead of the ata_device structure. They are a property of the
channel and not just the devices attached to it. This allowed us to fix the
set_io_32bit function by removing the CONFIG_BLK_DEV_DTC2278 conditional. In
fact initialization shows that this is fixing many other host chipsets as
well since all of them did expect sometimes particular values for those
parameters in paralell on both drives attached to a channel but we where
allowed to apply different values on a per drive basis.
- The keep_settings flag is now unconditional and we don't mess with any
channel parameters before drive reset. Some chipsets really really expect
unconditionally that the tweaks they apply are always present and this wasn't
honoured thus far! We are expecting the user to have good reasons for
manually tweaking the settings.
- Don't reset io_32bit in ata_pre_reset() unconditionally. There are chipsets
out there which expect io_32bit to be *allways* enabled!
- Remove many obsolete and nawadays just confusing documentation from ide.txt
Dave Hansen [Tue, 16 Apr 2002 06:47:47 +0000 (23:47 -0700)]
[PATCH] fix ips driver compile problems
This patch has been floating inside IBM for a bit, but it appears
that no one passed it back up to you, yet. I don't know who wrote it,
but it applies to 2.5.8 and the ServeRAID driver works just fine with it
applied. Without it, the driver fails to compile.
David Mosberger [Tue, 16 Apr 2002 06:47:27 +0000 (23:47 -0700)]
[PATCH] prctl() patch
This is the patch to add support for PR_SET_FPEMU/PR_GET_FPEMU to give
per-process control over fp-emulation handling. It also cleans up the
way PR_SET_UNALIGN_CTL/PR_GET_UNALIGN_CTL are implemented.
Randy observed that 2.5 /proc/meminfo SwapFree holds steady
while SwapTotal goes up and down: wrong way round!
Andrew pointed to wrong conditional in si_swapinfo(): 2.5.4
updated it incorrectly when new flags bit was briefly added.
And pointed out that it also makes si_swapinfo much too slow.
Note patch compiles but is otherwise untested as no kernel after 2.5.7
boots on my 2.5 box due to IDE hanging the box hard during device
discovery. )-:
Martin Dalecki [Tue, 16 Apr 2002 06:28:45 +0000 (23:28 -0700)]
[PATCH] 2.5.8 IDE 37
- Don't abuse the sense field for passing failed packet_commands in struct
packet_command use a new field instead.
- Apply minor bits forwarded by Dave Jones to me.
- Fix ide_raw_taskfile() to flag the ar used there to be no subject of free_req
list management. This solvs the "hang after /proc/ide read" problem, which
was in fact a memory corruption problem.
Martin Dalecki [Tue, 16 Apr 2002 06:28:33 +0000 (23:28 -0700)]
[PATCH] 2.5.8 IDE 36
- Consolidate ide_choose_drive() and choose_drive() in to one function.
- Remove sector data byteswpapping support. Byte-swapping the data is supported
on the file-system level where applicable. Byte-swapped interfaces are
supported on a lower level anyway. And finally it was used inconsistently.
- Eliminate taskfile_input_data() and taskfile_output_data(). This allowed us
to split up ideproc and eliminate the ugly action switch as well as the
corresponding defines.
- Remove tons of unnecessary typedefs from ide.h
- Prepate the PIO read write code for soon overhaul.
We use the makefile variable $(foo-objs) to list the objects
a composed module foo.o is supposed to be composed of.
We use the special varible $(export-objs) to list the object files which
export symbols.
This oviously clashes in the case of foo == export. There's basically
two ways to handle it: (1) rename one of these options, like
foo-objs to foo-parts or something, or (2) simply disallow a composite
object called export.o, so you never need $(export-objs) to list its
parts.
As (1) would affect basically all Makefiles in the tree and (2) doesn't
seem much of a limitation, I went for (2).
Oliver Neukum [Tue, 16 Apr 2002 04:13:50 +0000 (21:13 -0700)]
[PATCH] kaweth usb driver updates
USB kaweth driver updates
- fixed race between close and disconnect
- disconnect bug
- add link state reporting
- fix an urb reference counting bug
- fix probe oopsability on oom
- groundwork for atomic pool depletion
- cosmetic changes
Neil Brown [Mon, 15 Apr 2002 15:33:55 +0000 (08:33 -0700)]
[PATCH] PATCH - Create "export_operations" interface for filesystems to describe
Create "export_operations" interface for filesystems to describe
whether and how they should be exported.
- add new field in struct super_block "s_export_op" to describe
how a filesystem is exported (i.e. how filehandles are mapped to
dentries).
- New module: fs/exportfs for holding helper code for mapping between
filehandles and dentries
- Change nfsd to use new interface if it exists.
- Change ext2 to provide new interface
- Add documention to filesystems/Exporting
If s_export_op isn't set, old mechanism still works, but it is
planned to remove old method and only use s_export_op.
Neil Brown [Mon, 15 Apr 2002 15:33:51 +0000 (08:33 -0700)]
[PATCH] dcache changes for preparing for "export_operations" interface for nfsd to use.
Prepare for new export_operations interface (for filehandle lookup):
- define d_splice_alias and d_alloc_anon.
- define shrink_dcache_anon for removing anonymous dentries
- modify d_move to work with anonymous dentries (IS_ROOT dentries)
- modify d_find_alias to avoid anonymous dentries where possible
as d_splice_alias and d_alloc_anon use this
- put in place infrastructure for s_anon allocation and cleaning
- replace a piece of code that is in nfsfh, reiserfs and fat
with a call to d_alloc_anon
- Rename DCACHE_NFSD_DISCONNECTED to DCACHE_DISCONNECTED
- Add documentation at Documentation/filesystems/Exporting
[PATCH] various pegasus and rtl8150 fixes and improvements
USB pegasus and rtl8150 fixes and improvements
pegasus:
- using preallocated skb thus avoiding memcpy in the receive path;
- tasklet used to handle failed skb allocations and Rx urb submission;
- Lindent run on the result.
rtl8150:
- better tasklet handling and a few races fixed;
- introducing new flag for Rx urb resubmission;
- GFP_KERNEL to GFP_ATOMIC flag change in Tx path.
Robert Love [Mon, 15 Apr 2002 06:59:56 +0000 (23:59 -0700)]
[PATCH] migration_thread preempt fix
This fixes a race in migration_thread which results in a deadlock on
boot for some SMP systems. The fix is to to disable preemption inside
of set_cpus_allowed.
Andrew Morton first noticed the problem and provided the following patch
a few weeks back. I was not affected until the recent migration_init
fix, for some odd reason. Neither Andrew nor I think this is actually
kernel preemption's fault but perhaps a race in the tricky behavior of
the migration code.
Dave Hansen [Mon, 15 Apr 2002 06:31:05 +0000 (23:31 -0700)]
[PATCH] fix race and remove BKL from wdt977
We've seen this in several other drivers, most recently the indydog one.
If two simultaneous opens occur, they race, they device gets opened
twice, blah, blah, blah. Using atomic bitops fixes this. The BKL is
not needed.
Paul Mackerras [Mon, 15 Apr 2002 06:30:27 +0000 (23:30 -0700)]
[PATCH] fix include/linux/smp.h
This patch adds #include <linux/threads.h> to include/linux/smp.h,
because it (smp.h) needs the definition of NR_CPUS. (It so happens
that include/asm-i386/smp.h includes <linux/threads.h>, but IMHO
include/linux/smp.h shouldn't rely on that).
Andrew Morton [Mon, 15 Apr 2002 06:29:22 +0000 (23:29 -0700)]
[PATCH] don't allocate ratnodes under PF_MEMALLOC
On the swap_out() path, the radix-tree pagecache is allocating its
nodes with PF_MEMALLOC set, which allows it to completely exhaust the
free page lists(*). This is fairly easy to trigger with swap-intensive
loads.
It would be better to make those node allocations fail at an earlier
time. When this happens, the radix-tree can still obtain nodes from its
mempool, and we leave some memory available for the I/O layer.
(Assuming that the I/O is being performed under PF_MEMALLOC, which it
is).
So the patch simply drops PF_MEMALLOC while adding nodes to the
swapcache's tree.
We're still performing atomic allocations, so the rat is still biting
pretty deeply into the page reserves - under heavy load the amount of
free memory is less than half of what it was pre-rat.
It is unfortunate that the page allocator overloads !__GFP_WAIT to also
mean "try harder". It would be better to separate these concepts, and
to allow the radix-tree code (at least) to perform atomic allocations,
but to not go below pages_min. It seems that __GFP_TRY_HARDER will be
pretty straightforward to implement. Later.
The patch also impements a workaround for the mempool list_head
problem, until that is sorted out.
(*) The usual result is that the SCSI layer dies at scsi_merge.c:82.
It would be nice to have a fix for that - it's going BUG if 1-order
allocations fail at interrupt time. That happens pretty easily.
Liyang Hu [Mon, 15 Apr 2002 06:28:40 +0000 (23:28 -0700)]
[PATCH] Bug in NLS UTF-8 code
I've recently (actually, last month, but I had been a bit too busy
since then) come across a wee problem, in what I originally thought
was the VFAT code -- having `utf8' as one of the options, creating
UTF-8 file names on a VFAT partition mysteriously gains a couple of
(random) characters just after the UTF-8 escaped character: eg.
touch "fooCbar" where C is an UTF-8 escape sequence ends up creating
a file named "fooCRbar". (R being some random character.)
I eventually tracked it down to one line in fs/nls/nls_base.c -- the
UCS-2 (wchar_t) string pointer was being incremented too fast. After
consulting Ogawa Hirofumi-san on the subject, he mentioned that
include/linux/nls.h also needs to be changed for proper UTF-8
support in the NLS code.
- Expand configure help options a bit
- Fix xconfig bug
- Decrease queue depth if a command takes too long to complete
- Test master/slave stuff. It works, but one device can heavily starve
another. This is the simple approach right now, means that one device
will wait until the other is completely idle before starting any
commands This is not necessary since we can have queued commands on
both devices at the same time. TODO.
- Add proc output for oldest command, just for testing.
- pci_dev compile fixes.
- Make sure ide-disk doesn't BUG if TCQ is not used, basically this was
fixed by off-loading the using_tcq setting to ide-tcq.
- Remove warning about 'queued feature set not supported'
- Abstract ide_tcq_wait_dataphase() into a function
Martin Dalecki [Mon, 15 Apr 2002 03:21:46 +0000 (20:21 -0700)]
[PATCH] 2.5.8 IDE 34
- Synchronize with 2.5.8.
- Eliminate the cdrom_log_sense() function.
- Pass a struct request to cdrom_analyze_sense_data() since this is the entity
this function is working on. This shows nicely that this function is broken.
- Use CDROM_PACKET_SIZE where appropriate.
- Kill the obfuscating cmd_buf and cmd_len local variables from
cdrom_transfer_packet_command(). This made it obvious that the parameters of
this function where not adequate - to say the least. Fix this.
- Pass a packed command array directly to cdrom_queue_packed_command(). This
is reducing the number of places where we have to deal with the c member of
struct packet_command.
- Never pass NULL as sense to cdrom_lockdoor().
- Eliminate cdrom_do_block_pc().
- Eliminate the c member of struct packet_command. Pass them through struct
request cmd member.
- Don't enable TCQ unconditionally if there is a TCQ queue depth defined.
- Fix small think in ide_cmd_ioctl() rewrite. (My appologies to everyone who
has to use hdparm to setup his system...)
the IRQ balancing feature is based on the following requirements:
- irq handlers should be cache-affine to a large degree, without the
explicit use of /proc/irq/*/smp_affinity.
- idle CPUs should be preferred over busy CPUs when directing IRQs towards
them.
- the distribution of IRQs should be random, to avoid all IRQs going to
the same CPU, and to avoid 'heavy' IRQs from loading certain CPUs
unfairly over CPUs that handle 'light' IRQs. The IRQ system has no
knowledge about how 'heavy' an IRQ handler is in terms of CPU cycles.
here is the design and implementation:
- we make per-irq decisions about where the IRQ will go to next. Right
now it's a fastpath and a slowpath, the real stuff happens in the slow
path. The fastpath is very lightweight.
- [ i decided not to measure IRQ handler overhead via RDTSC - it ends up
being very messy, and if we want to be 100% fair then we also need to
measure softirq overhead, and since there is no 1:1 relationship
between softirq load and hardirq load, it's impossible to do
correctly. So the IRQ balancer achieves fairness via randomness. ]
- we stay affine in the micro timescale, and we are loading the CPUs
fairly in the macro timescale. The IO-APIC's lowest priority
distribution method rotated IRQs between CPUs once per IRQ, which was
the worst possible solution for good cache-affinity.
- to achieve fairness and to avoid lock-step situations some real
randomness is needed. The IRQs will wander in the allowed CPU group
randomly, in a brownean motion fashion. This is what the 'move()'
function accomplishes. The IRQ moves one step forward or one step
backwards in the allowed CPU mask. [ Note that this achieves a level of
NUMA affinity as well, nearby CPUs are more likely to be NUMA-affine. ]
- the irq balancer has some knowledge about 'how idle' a single CPU is.
The idle task updates the idle_timestamp. Since this update is in the
idle-to-be codepath, it does not increase the latency of idle-wakeup,
the overhead should be zero in all cases that matter. The idle-balancing
happens the following way: when searching for the next target CPU after
a 'IRQ tick' has expired, we first search 'idle enough' CPUs in the
allowed set. If this does not succeed then we search all CPUs.
- the patch is fully compatible with the /proc/irq/*/smp_affinity
interface as well, everything works as expected.
note that the current implementation can be expressed equivalently in
terms of timer-interrupt-driven IRQ redirection. But i wanted to get some
real feedback before removing the possibility to do finer grained
decisions - and the per-IRQ overhead is very small anyway.
Alexander Schulz [Sun, 14 Apr 2002 18:41:48 +0000 (19:41 +0100)]
[PATCH] 1107/1: Shark: defconfig and updates
This patch updates the defconfig for the Shark and adds an
extern and a define so that the kernel compiles for the Shark.
[PATCH] 1101/1: Make armksyms.c compile again with gcc 3.0.2
Make arch/arm/kernel/armksyms.c compile again with gcc 3.0.2 because of new EXPORT_SYMBOL_NOVERS(abort); in patch-2_4_18-rmk3.gz. See my mail "EXPORT_SYMBOL_NOVERS(abort) in armksyms.c?" in linux-arm-kernel list from Mon, 25 Mar 2002.
For !CONFIG_SMP we want the empty inline setup_per_cpu_areas().
If CONFIG_SMP is set, we never want the empty inline. If we use the
generic implementation, we have it here, if not the arch has it somwhere
else (hopefully).
This makes the cpqfc driver recognize the HP Tachyon. I moved the
device list to an __initdata structure so the driver doesn't build it at
runtime and changed it to use the proper PCI_DEVICE_ID_* names.
With this patch applied, the driver happily detects the disks attached
to my HP Tachyon.