Andrew Morton [Sat, 10 Aug 2002 11:40:05 +0000 (04:40 -0700)]
[PATCH] Infrastructure for atomic user accesses
Well the optimum solution there would be to create and use
`inc_preempt_count_non_preempt()'. I don't see any
way of embedding this in kmap_atomic() or copy_to_user_atomic()
without loss of flexibility or incurring a double-inc somewhere.
Andrew Morton [Sat, 10 Aug 2002 09:44:55 +0000 (02:44 -0700)]
[PATCH] fix a race between set_page_dirty and truncate
Fix a race between set_page_dirty() and truncate.
The page could have been removed from the mapping while this CPU is
spinning on the lock. __free_pages_ok() will go BUG.
This has not been observed in practice - most callers of
set_page_dirty() hold the page lock which gives exclusion from
truncate. But zap_pte_range() does not.
Andrew Morton [Sat, 10 Aug 2002 09:44:47 +0000 (02:44 -0700)]
[PATCH] sync get_user_pages with 2.4
Forward port of get_user_pages() change from 2.4.
- If the vma is marked as VM_IO area then fail the map.
This prevents kernel deadlocks which occur when applications which
have frame buffers mapped try to dump core. Also prevents a kernel
oops when a debugger is attached to a process which has an IO mmap.
- Check that the mapped page is inside mem_map[] (pfn_valid).
- inline follow_page() and remove the preempt_disable()s. It has
only a single callsite and is called under spinloclk.
Andrew Morton [Sat, 10 Aug 2002 09:44:42 +0000 (02:44 -0700)]
[PATCH] tunable ext3 commit interval
The patch from Stephen Tweedie allows users to modify the journal
commit interval for the ext3 filesystem.
The commit interval is normally five seconds. For portable computers
with spun-down drives it is advantageous to be able to increase the
commit interval.
There may also be advantages in decreasing the commit interval for
specialised applications such as heavily-loaded NFS servers which are
using synchronous exports.
The laptop users will also need to increase the pdflush periodic
writeback interval (/proc/sys/vm/dirty_writeback_centisecs), because
the `kupdate' activity also forces a commit.
To specify the commit interval, use
mount -o commit=30 /dev/hda1 /mnt/whatever
or
mount -o remount,commit=30 /dev/hda1
The commit interval is specified in units of seconds.
Andrew Morton [Sat, 10 Aug 2002 09:44:38 +0000 (02:44 -0700)]
[PATCH] copy_strings speedup
This is the first of three patches which reduce the amount of
kmap/kunmap traffic on highmem machines.
The workload which was tested was RAM-only dbench. This is dominated
by copy_*_user() costs.
The three patches speed up my 4xPIII by 3%
The three patches speed up a 16P NUMA-Q by 100 to 150%
The first two patches (copy_strings and pagecache reads) speed up an
8-way by 15%. I expect that all three patches will speed up the 8-way
by 40%.
Some of the benefit is from reduced pressure on kmap_lock. Most of it
is from reducing the number of global TLB invalidations.
This patch fixes up copy_strings(). copy_strings does a huge amount of
kmapping. Martin Bligh has noted that across a kernel compile this
function is the second or third largest user of kmaps in the kernel.
The fix is pretty simple: just hang onto the previous kmap as we we go
around the loop. It reduces the number of kmappings from copy_strings
by a factor of 30.
Andrew Morton [Sat, 10 Aug 2002 09:44:34 +0000 (02:44 -0700)]
[PATCH] 3c905B fix
Patch from Zwane which fixes a transceiver problem on his 3c905B.
Apparently the 905B's MII status register is saying that it doesn't
need preamble, but the datasheet says that it does. So add a 905B
override for that in the device table.
This could break other 3c905B's. I don't know. There's only one way
to find out.
Russell King [Sat, 10 Aug 2002 09:44:30 +0000 (02:44 -0700)]
[PATCH] build warning fix
This patch has been verified to apply cleanly to 2.5.30
This patch fixes a build warning in smp.h. register_cpu_notifier uses
struct notifier_block in its argument list. Unfortunately, there are
places where smp.h is included before the definition of this structure.
Pavel Machek [Sat, 10 Aug 2002 09:44:25 +0000 (02:44 -0700)]
[PATCH] S3 and swsusp: fixing device_resume order
pci driver's resume must not be called during RESUME_POWER_ON because
interrupts are still off and i8259A is not initialized [OHCI kills
machine in such case, cardbus probably too. PCI drivers just assume
initialized interrupts.]
Second hunk fixes device_resume calls to be okay according to
documentation.
Alexander Viro [Sat, 10 Aug 2002 09:22:02 +0000 (02:22 -0700)]
[PATCH] fix check_disk_change() deadlocks
Small, but tricky: fix for check_disk_change() deadlocks.
What we do is
a) opening block device shifted from check_partition() to
grok_partitions(); check_partitions() takes opened
struct block_device.
b) all callers of check_disk_change() fall in two groups -
ones that are called only from some ->open() and ones
that are _never_ called from ->open(). There is no
middle ground. We split the thing in two functions -
check_disk_change() for the first class and full_check_....
for the second. The former (ones inside ->open()) doesn't
touch partition tables but marks the bdev as "had been
invalidated". In the end of do_open() we check if
bdev is marked and call wipe_partitions()/check_partition()
if it is - at that point bdev is fully set up and ready.
c) ->bd_part_sem kludge is gone - we use ->bd_sem instead.
That is, do_open() on a partition grabs ->bd_sem on entire
disk and picks partition data while under it; do_open() on
entire disk rereads partition if needed before dropping
->bd_sem (right before dropping it); BLKRRPART does
trylock on ->bd_sem and then checks ->bd_part_count -
same logics as before, except that we use ->bd_sem instead
of ->bd_part_sem.
That kills recursive open(), gives us the same exclusion rules as
we had and makes sure that actual IO (including rereading partition
tables) is done only when we are ready to do it.
It actually sounds a lot nastier than it is. do_open() is a one sick
puppy right now, but we have everything in one place and _out_ of drivers
(and 20-odd equally sick puppies are gone from them, along with about
the same number of races).
Now we are almost ready to clean it up for good - all that remains to
do before that is to get the rest of drivers (cciss, DAC960, i2o and
a couple of ancients - xd and acsi) using per-disk gendisks. Then
most of that crap will disappear.
BTW, the only generic ioctl remaining in the drivers is HDIO_GETGEO -
a lot of foo_ioctl() starts with if (cmd != HDIO_GETGEO) return -EINVAL; ;-)
Alexander Viro [Sat, 10 Aug 2002 09:21:49 +0000 (02:21 -0700)]
[PATCH] partition table flush/read cleanup
Big One. Flushing/rereading partition tables is taken from
->revalidate() for partitioned devices; now it's done in the
caller (check_disk_change()). BLKRRPART handling also moved
out of drivers - they are still allowed to override it (DAC960
and i2o are the only remaining ones), but common case is handled
in fs/block_dev.c.
Note: we are still only shifting stuff - bd_sem deadlocks in
check_disk_change() are still there. However, now we have all
relevant code outside of drivers and that will allow to fix the
thing (see next patches).
Alexander Viro [Sat, 10 Aug 2002 09:21:45 +0000 (02:21 -0700)]
[PATCH] ide subdrivers attach() cleanup
->attach() for ide subdrivers explicitly calls register_disk()
instead of ata_revalidate() now; revalidate_drives() is gone -
it's not needed anymore (we _know_ that we'll read partition
table as soon as driver claims the drive; no need to mess with
bogus rereading).
Alexander Viro [Sat, 10 Aug 2002 09:21:40 +0000 (02:21 -0700)]
[PATCH] clean up major_name
->major_name for per-disk gendisks set to full name - i.e.
IDE gendisks have "hda", "hdb", etc. instead of "hd".
As the result, we kill a lot of crap in check.c::disk_name().
In particular, now we can afford ->minor_shift set to 0
for ide-cd (disk_name() was the only obstacle)
Alexander Viro [Sat, 10 Aug 2002 09:21:36 +0000 (02:21 -0700)]
[PATCH] make check_disk_change() use struct block_device
check_disk_change() converted to passing struct block_device.
Old variant is still needed for a couple of places; wrapper
is provided (__check_disk_change(kdev)). do_open() logics
with setting ->bd_op sanitized - now we do that before calling
->open().
- inline grab_cache_page() in pagemap.h, it's just a simple wrapper
around find_or_create_page()
- rename (__)remove_inode_page to (__)remove_from_page_cache and
move them from mm.h and swap.h to pagemap.h because they reverse
add_to_page_cache and that's where they belong.
Ivan Kokshaysky [Sat, 10 Aug 2002 09:03:21 +0000 (02:03 -0700)]
[PATCH] alpha: misc fixes [9/10]
Set of small fixes:
- pcibios_init() must be int;
- fls() - ctlz on ev67, generic on others. This was required for
something several kernel releases back, now it seems to be unused.
Anyway, it shouldn't hurt, so included here.
- missing #includes, missing #if RTC_IRQ in drivers/char/rtc.c;
- define USER_HZ;
From Jeff Wiedemeier:
- rename alpha-specific config section 'General setup' to 'System setup'
to avoid confusion with generic 'General setup';
- fix the 'bootpfile' build.
- osf_getrusage() updated for new utime/stime fields of the task_struct;
- compatibility wrappers for OSF/1 v4 readv/writev syscalls:
forward port from 2.4.19.
Ivan Kokshaysky [Sat, 10 Aug 2002 09:03:08 +0000 (02:03 -0700)]
[PATCH] alpha: interrupt/preempt update [6/10]
This one is large mostly because of massive code deletion.
- cli, sti an so on go away;
- irq_smp.c goes to /dev/null; the only leftover (synchronize_irq)
moved to irq.c;
- hardirq count field in the preemption counter extended to 12 bits -
one more than required for wildfire.
Ivan Kokshaysky [Sat, 10 Aug 2002 09:02:59 +0000 (02:02 -0700)]
[PATCH] alpha: regdef.h [4/10]
Historically, assembly routines included libc header <alpha/regdef.h>
for OSF/1 register names. With the new kernel build system
it doesn't work anymore. Make our own copy in <include/asm>.
Ivan Kokshaysky [Sat, 10 Aug 2002 09:02:51 +0000 (02:02 -0700)]
[PATCH] alpha: IPI update [2/10]
- send_ipi_message() fix from Jeff Wiedemeier:
The 2.5.30 IPI algorithm (with the to_whom == set test) incorrectly sends
IPI messages to CPU 0 in a SMP system running with one processor. In this
case to_whom is often 0 (cpu_present_mask & ~1UL << smp_processor_id()) which
ends up triggering the to_whom == set case.
- migration IPI removed;
This starts a large set of alpha patches accumulated since 2.5.18 or
even earlier. All of this was reasonably well tested.
Thanks to Jeff Wiedemeier for SMP testing and fixes.
- sync up with (2.5.18?) pte/pfn/page/tlb etc. macros;
- asm-generic/tlb.h: loading unsigned long constant to unsigned int
tlb->nr causes compiler warnings on 64 bit platforms.
NTFS: 2.0.24 - Cleanups.
- Treat BUG_ON() as ASSERT() not VERIFY(), i.e. do not use side effects
inside BUG_ON(). (Adam J. Richter)
- Split logical OR expressions inside BUG_ON() into individual BUG_ON()
calls for improved debugging. (Adam J. Richter)
- Add errors flag to the ntfs volume state, accessed via
NVol{,Set,Clear}Errors(vol).
- Do not allow read-write remounts of read-only volumes with errors.
- Clarify comment for ntfs file operation sendfile which was added by
Christoph Hellwig a while ago (just using generic_file_sendfile())
to say that ntfs ->sendfile is only used for the case where the
source data is on the ntfs partition and the destination is
somewhere else, i.e. nothing we need to concern ourselves with.
Matthew Wilcox [Tue, 6 Aug 2002 07:53:18 +0000 (00:53 -0700)]
[PATCH] fix expand_stack for upward-growing stacks
- trivial: cache file->f_dentry->d_inode; saves a few bytes of compiled
size.
- move expand_stack inside ARCH_STACK_GROWSUP, add an alternate
implementation for PA-RISC.
- partially fix the comment (mmap_sem is held for READ, not for WRITE).
It still doesn't make sense, saying we don't need to take the spinlock
right before we take it. I expect one of the vm hackers will know
what the right thing is.
Testing of course revealed some bugs introduced during the cleanups,
so these are fixed here with a couple of other small bits, like improved
debugging code.
Paul Mackerras [Tue, 6 Aug 2002 06:38:41 +0000 (16:38 +1000)]
PPC32: miscellanous small fixes.
Rename print_backtrace to show_stack and improve it, remove the
#if 0 around set_fpexc mode and add get_fpexc_mode, add the
__NR_security define and reserve syscall 225 for Tux.
PPC32: interrupt fixes along the lines of Ingo's changes to x86.
We don't unmask the interrupt at the end of handling it if there
is no action (i.e. someone has done free_irq). Add some likely
and unlikely hints and fix synchronize_irq.
Patrick Mochel [Mon, 5 Aug 2002 04:13:08 +0000 (21:13 -0700)]
driverfs: decrement refcount on dentry being removed, not directory
This brain fart is left over from some cleanup of these functions a _long_
time ago. We need to dput() the dentry, since we have an implicit count of
one left over from the create function.
Instead, we were dput() on the directory that it was in, which we didn't
have a matching dget() for.
Dave Kleikamp [Mon, 5 Aug 2002 03:35:46 +0000 (22:35 -0500)]
Rework JFS's inode locking
In order for JFS to be able to quiesce the current activity, while
blocking new transactions, the locking needed some rework. New
transactions are stopped in the functions txBegin or txBeginAnon,
where the rdwrlock (IREAD_LOCK/IWRITE_LOCK) may be held. Dirty
inodes may need to be committed while new transactions are blocked
here, so another lock is introduced (commit_sem) which is taken after
txBegin/txBeginAnon is called. This ensures that the proper
serialization takes place, without the write_inode method needing to
grab the rdwrlock.
In addition, the use of IWRITE_LOCK and IREAD_LOCK has been removed
from directory inodes. The serialization done by the VFS using i_sem
is sufficient to avoid races.
This patch removes JFS's dependency on down_write_trylock.
NTFS: 2.0.23 - Major bug fixes (races, deadlocks, non-i386 architectures).
- Massive internal locking changes to mft record locking. Fixes lock
recursion and replaces the mrec_lock read/write semaphore with a
mutex. Also removes the now superfluous mft_count. This fixes several
race conditions and deadlocks, especially in the future write code.
- Fix ntfs over loopback for compressed files by adding an
optimization barrier. (gcc was screwing up otherwise ?)
- Miscellaneous cleanups all over the code and a fix or two in error
handling code paths.
Thanks go to Christoph Hellwig for pointing out the following two:
- Remove now unused function fs/ntfs/malloc.h::vmalloc_nofs().
- Fix ntfs_free() for ia64 and parisc by checking for VMALLOC_END, too.
Clean up eepro100 update from David M-T:
- remove outdated comment about 2.3-only
- style up David's changelog entry like the others
- replace ifdef RX_ALIGN with a rx_align() macro
- kill pointless #if defined(MODULE) || defined(CONFIG_HOTPLUG)
around ->remove.