Trond Myklebust [Mon, 23 Aug 2004 15:21:20 +0000 (11:21 -0400)]
NFSv2/v3/v4: Make the rpc_ops->getattr method take a filehandle
rather than an inode argument. Fix up nfs_instantiate() and
_nfs4_do_open to use this since doing a new lookup might be racy.
Trond Myklebust [Mon, 23 Aug 2004 15:19:03 +0000 (11:19 -0400)]
NFSv2/v3/v4: Place NFS nfs_page shared data into a single structure
that hangs off filp->private_data. As a side effect, this also
cleans up the NFSv4 private file state info.
Trond Myklebust [Mon, 23 Aug 2004 14:18:16 +0000 (10:18 -0400)]
NFSv2: In the NFSv3 RFC, the sattr3 structure passed in the SETATTR
call allows for the client to request that the mtime and/or atime
of an inode be set to the current server time, the given (client)
time, or not changed. The set-to-current-server value is used
when you run "touch file" on the client.
The NFSv2 RFC defines no such encoding for the sattr structure.
However Solaris and Irix machine obey a convention where passing
the invalid value mtime.useconds=1000000 means "set both mtime and
atime to the current server time". The convention is documented
in the book "NFS Illustrated" by Brent Callaghan. The patch below
implements this convention for the Linux client and server (hence
multiple To:s).
Trond Myklebust [Mon, 23 Aug 2004 14:17:20 +0000 (10:17 -0400)]
KCONFIG: In the kernel help for NFSv3 & NFSv4 client support both are
listed as "the newer version ... of the NFS protocol". Obviously
both can't be the newer version at the same time, so here's a
patch to correct the text in such a way that only v4 is listed as
the newer version. Patch is against 2.6.7-rc3 - please consider
including it.
Trond Myklebust [Mon, 23 Aug 2004 14:16:26 +0000 (10:16 -0400)]
NFS: Now that file handle comparison ignores the unused parts of the
file handle container, there is no longer any need to clear the
file handle container before copying in a file handle. This
allows us to remove a 128 byte memset() from several hot paths.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Trond Myklebust [Mon, 23 Aug 2004 14:15:49 +0000 (10:15 -0400)]
NFS: While the storage container for NFS file handles must be able to
store 128 bytes, usually NFS servers don't use file handles that
are more than 32 bytes in size. This patch creates an efficient
mechanism for comparing file handles that ignores the unused bytes
in a file handle.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Trond Myklebust [Mon, 23 Aug 2004 14:15:13 +0000 (10:15 -0400)]
NFS: In 2.4, NFS O_DIRECT used the VFS's O_DIRECT logic to provide
direct I/O support for NFS files. The 2.4 VFS O_DIRECT logic was
block based, thus the NFS client had to provide a minimum
allowable blocksize for O_DIRECT reads and writes on NFS files.
For various reasons we chose 512 bytes. In 2.6, there is no
requirement for a minimum blocksize. NFS O_DIRECT reads and
writes can go to any byte at any offset in a file. Thus we revert
the blocksize setting for NFS file systems to the previous
behavior, which was to advertise the "wsize" setting as the
optimal I/O block size. This improves the performance of
applications like 'cp' which use this value as their transfer
size.
This patch also exposes the server's reported disk block size in the
f_frsize of the vfsstat structure.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Trond Myklebust [Mon, 23 Aug 2004 14:13:19 +0000 (10:13 -0400)]
NFS: Break the nfs_wreq_lock into per-mount locks. This helps prevent
a heavy read and write workload on one mount point from
interfering with workloads on other mount points.
Note that there is still some serialization due to the big kernel
lock.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Trond Myklebust [Mon, 23 Aug 2004 14:12:24 +0000 (10:12 -0400)]
RPCSEC_GSS: Add the spkm3 common and client-side code.
Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Trond Myklebust [Mon, 23 Aug 2004 14:10:18 +0000 (10:10 -0400)]
NFSv4: OK, so it's trivial and probably superfluous, but I don't see
why we shouldn't be slightly stricter here, so I'm just going to
keep sending this until I'm told to stop.... Make sure that
unmapped errors are approximately in the range of defined NFS4
errors.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Trond Myklebust [Mon, 23 Aug 2004 14:01:57 +0000 (10:01 -0400)]
RPC: Reduce stack utilization for all synchronous NFS operations by
using a dynamically allocated rpc_task structure instead of
allocating one on the stack. This reduces stack utilization by
over 200 bytes for all synchronous NFS operations.
Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
David S. Miller [Mon, 23 Aug 2004 07:34:58 +0000 (00:34 -0700)]
[SPARC64]: Fix bugs in new U1memcpy code.
- U1copy_from_user needs PREAMBLE since it uses
explicit ASI_BLK_AIUS references.
- Need to use EX_RETVAL() in U1memcpy.S
- U1memcpy.S can load one 64-bit word too
many, passing the source buffer boundary
and thus potentially causing exceptions.
David S. Miller [Mon, 23 Aug 2004 07:33:47 +0000 (00:33 -0700)]
[SPARC64]: Revamped memcpy infrastructure.
- Make it easier to maintain the Ultra-I vs. Ultra-III
memcpy implementations. Before you had to maintain
3 different entire copies of the routines.
- Kill %asi register writing Ultra-I single memcpy loop
for both user and kernel. Was not worth it.
- Simplify exception detection and handling enormously.
Trond Myklebust [Mon, 23 Aug 2004 07:14:37 +0000 (00:14 -0700)]
[PATCH] Fix posix file locking (9/9)
NFSv2/v3: Fix up a race in the case where the user presses ^C while a
process is in the middle of setting up a posix lock. In case the
server registered our lock, we need to make sure that it gets
cleaned up during the resulting file close().
Trond Myklebust [Mon, 23 Aug 2004 07:14:14 +0000 (00:14 -0700)]
[PATCH] Fix posix file locking (7/9)
VFS,CIFS,NLM,NFSv4: make filesystems directly responsible for calling
posix_lock_file() if they need it. This fixes an NFS race whereby
in case of a server reboot, the recovery thread could re-establish
a lock that had just been freed.
Trond Myklebust [Mon, 23 Aug 2004 07:13:51 +0000 (00:13 -0700)]
[PATCH] Fix posix file locking (5/9)
NLM: file_lock->fl_owner may live for longer than the pid of the
original process that created it. Fix NFSv2/v3 client locking code
to map file_lock->fl_owner into a unique 32-bit number or
"pseudo-pid".
There's a misplaced check returning error for
hpt_minimum_revision(dev,8) == TRUE still there, making the previous
fixes useless for the early revision HPT cards.
Adrian Bunk [Mon, 23 Aug 2004 06:09:03 +0000 (23:09 -0700)]
[PATCH] cciss /proc dependency fix
cciss uses /proc to hook into the SCSI subsystem. If you do not build
/proc support into your kernel then you should also disable tape support in
the driver.
Signed-off-by: Adrian Bunk <bunk@fs.tum.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mike Miller [Mon, 23 Aug 2004 06:08:30 +0000 (23:08 -0700)]
[PATCH] cciss: pdev->intr fix
This patch fixes our usage of pdev->intr. We were truncating it to an unchar.
We were also reading it before calling pci_enable_device. This patch fixes
both of those. Thanks to Bjorn Helgaas for the patch.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mike Miller [Mon, 23 Aug 2004 06:07:44 +0000 (23:07 -0700)]
[PATCH] cciss: /proc fixes
This patch fixes our output in /proc to display the logical volume sizes and
RAID levels correctly. Without this patch RAID level will always be 0 and
size may be displayed as 0GB.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mike Miller [Mon, 23 Aug 2004 06:07:33 +0000 (23:07 -0700)]
[PATCH] cciss: zero out buffer in passthru ioctls for HP utilities
This patch addresses a problem with our utilities. We must zero out the
buffer before copying their data into it to prevent bogus info when switching
between SCSI & SATA or SAS drives.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anton Blanchard [Mon, 23 Aug 2004 06:07:10 +0000 (23:07 -0700)]
[PATCH] Fix gcc 3.5 compile issue in mm/mempolicy.c
Fix another gcc 3.5 compile issue, this time the default_policy prototype
was not marked static whereas the definition was. There is no need for
the prototype, so remove it.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rik van Riel [Mon, 23 Aug 2004 06:06:58 +0000 (23:06 -0700)]
[PATCH] increase per-user mlock limit default to 32k
Since various gnupg users have indicated that gpg wants to mlock 32kB of
memory, I created the patch below that increases the default mlock ulimit
to 32kB.
This is no security problem because it's trivial for processes to lock way
more memory than this in page tables, network buffers, etc. In fact, since
this patch allows gnupg to mlock to prevent passphrase data from being
swapped out, the security people will probably like it ;)
This gets the new per-user mlock limit a bit more testing, too.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rik van Riel [Mon, 23 Aug 2004 06:06:46 +0000 (23:06 -0700)]
[PATCH] rlimit-based mlocks for unprivileged users
Here is the last agreed-on patch that lets normal users mlock pages up to
their rlimit. This patch addresses all the issues brought up by Chris and
Andrea.
From: Chris Wright <chrisw@osdl.org>
Couple more nits.
The default lockable amount is one page now (first patch is was 0). Why
don't we keep it as 0, with the CAP_IPC_LOCK overrides in place? That way
nothing is changed from user perspective, and the rest of the policy can be
done by userspace as it should.
This patch breaks in one scenario. When ulimit == 0, process has
CAP_IPC_LOCK, and does SHM_LOCK. The subsequent unlock or destroy will
corrupt the locked_shm count.
It's also inconsistent in handling user_can_mlock/CAP_IPC_LOCK interaction
betwen shm_lock and shm_hugetlb.
SHM_HUGETLB can now only be done by the shm_group or CAP_IPC_LOCK.
Not any can_do_mlock() user.
Double check of can_do_mlock isn't needed in SHM_LOCK path.
Interface names user_can_mlock and user_substract_mlock could be better.
Incremental update below. Ran some simple sanity tests on this plus my
patch below and didn't find any problems.
* Make default RLIM_MEMLOCK limit 0.
* Move CAP_IPC_LOCK check into user_can_mlock to be consistent
and fix but with ulimit == 0 && CAP_IPC_LOCK with SHM_LOCK.
* Allow can_do_mlock() user to try SHM_HUGETLB setup.
* Remove unecessary extra can_do_mlock() test in shmem_lock().
* Rename user_can_mlock to user_shm_lock and user_subtract_mlock
to user_shm_unlock.
* Use user instead of current->user to fit in 80 cols on SHM_LOCK.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ram Pai [Mon, 23 Aug 2004 06:06:34 +0000 (23:06 -0700)]
[PATCH] readahead fixes
Here is a consolidated readahead patch that takes care of the performance
regression seen with multiple threaded writes to the same file descriptor.
The patch does the following:
1. Instead of calculating the average count of sequential
access in the read patterns, it calculates the
average amount of hits in the current window.
2. This average is used to guide the size of the next current
window.
3. Since the field serial_cnt in the ra structure does not
make sense with the introduction of the new logic,
I have renamed that field as currnt_wnd_hit.
This patch will help the read patterns that are not neccessarily sequential
but have sufficient locality. However it may regress random workload.
Results:
1. Berkley Shands has reported great performance with this
patch.
2. iozone showed negligible effect on various read patterns.
3. DSS workload saw neglible change.
4. Sysbench saw a small improvement.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ram Pai [Mon, 23 Aug 2004 06:06:22 +0000 (23:06 -0700)]
[PATCH] readahead: simplify recent fixes
Ok I have enclosed the results for the recent readahead fixes. The summary
is: there is no significant improvement or decrease in performance of (DSS
workload, iozone, sysbench) The increase or decrease is in the margin of
errors.
I have enclosed a patch that partially backs off Miklos's fix. Shane
Shrybman correctly pointed out that the real fix is to set ra->average
value to max/2 when we move from readahead-off mode to readahead-on mode.
The other part of Miklos's fix becomes irrelevent.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zou Nanhai [Mon, 23 Aug 2004 06:05:48 +0000 (23:05 -0700)]
[PATCH] fix might-sleep-in-atomic while dumping elf
Here is a patch to fix a problem of might-sleep-in-atomic which David
Mosberger mentioned at
http://www.gelato.unsw.edu.au/linux-ia64/0407/10526.html
On IA64 platform, a might-sleep-in-atomic warning raise while dumping a
multi-thread process. That is because elf_core_dump holds the tasklist_lock
before the kernel does a access_process_vm in elf_core_copy_task_regs,
This patch detached elf_core_copy_task_regs function from inside
tasklist_lock to remove the warning.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] aio.c: rename 'struct timeout' to 'struct aio_timeout'
This patch renames fs/aio.c:'struct timeout' to 'struct aio_timeout'. The
rationale behind this decision is this type is used only inside the
aforementioned aio.c file and being the type name very generic, it is
likely to cause namespace conflicts in the future.
I actually found it while working on an extended schedule_timeout()- like
API used by robust mutexes but usable by anyone. There I declared a
'struct timeout' and aio.c complained about it. I could have also renamed
the struct for the schedule_timeout() like API, but being the aio.c one
specific to the file, I thought it might make more sense to rename the
later.
Matt Mackall [Mon, 23 Aug 2004 06:05:13 +0000 (23:05 -0700)]
[PATCH] Fix netpoll cleanup on abort without dev
If netpoll attempts to use a device without polling support, it will oops
when shutting down. This adds a check that we've actually attached to a
device.
Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Mon, 23 Aug 2004 06:04:49 +0000 (23:04 -0700)]
[PATCH] mark IS_ERR as unlikely()
It seems fair to assume that it is always unlikely that IS_ERR will return
true.
This patch changes the gcc-3.4-generated kernel text by ~500 bytes (less) so
it's fair to assume that the compiler is indeed propagating unlikeliness out
of inline functions.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Badari Pulavarty [Mon, 23 Aug 2004 06:04:37 +0000 (23:04 -0700)]
[PATCH] DIO pages-in-io accounting fix
I found one more accounting inconsistency with dio_pages_in_io. This is a
day-one bug and I started hitting it on latest -mm due to the recent
changes to dio_pages_in_io calculations to be exact.
If the file is badly fragmented (no contiguous blocks at all), and the user
buffer is not page aligned - we need to create IO for each disk block with
2 pages. (bio with 2 vecs).
dio_bio_add_page() should not decrement dio_pages_in_io for every add page.
It should only decrement, it only if its done with that page and moving on
to next page. (since dio_pages_in_io represent how many actual pages we
are operating on).
Here is the patch to fix this accounting. Without this patch, we will hit
BUG() in dio_new_bio() with O_DIRECT on filesystems.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Corey Minyard [Mon, 23 Aug 2004 06:04:14 +0000 (23:04 -0700)]
[PATCH] IPMI driver updates
Some people found some bugs and some missing functions in the IPMI driver,
so I have patching things together for the next release. The attached
patch moves to version 33 of the driver and contains:
* SMBIOS table support for specifying register spacing. This allows
non-contiguous registers to be specified and some machines do
this.
* ACPI table updates to support all the possible register sizes and
bit offsets into the registers for the IPMI information.
* Support for command line parameters to specify register
spacing, sizes, and bit offsets.
* Support for power control with IPMI. This allows a halt to
power down a machine with IPMI.
* A fix for a race condition with interrupts enabled on an
SMP machine. A lock was released then reclaimed, but
there was code later that assumed that had not happened.
* A fix for protecting the driver against bad responses from
the controller chip. In the past, the driver had assumed that
the controller chip would not give it bad data. This has
turned out to be a bad assumption
* ACPI interrupt handlers now return a return value, adjust
accordingly.
Thank you to all the people who helped me with this.
Paul Clements [Mon, 23 Aug 2004 06:03:50 +0000 (23:03 -0700)]
[PATCH] nbd: fix struct request race condition
Here's a patch to fix a race condition in nbd that was causing struct
request corruption (requests were being freed while still in use). This
patch improves on the previous one, which admittedly was a bit dodgy, using
struct request's ref_count field (I should have listened to Jens in the
first place :). This should fix all the corner cases related to struct
request usage/freeing in nbd. My stress tests do a lot better with this
patch applied.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nuke the real undefined symbol in sparc32. This is the only real hit from
ldchk on sparc32; the rest are all btfixup-related (Sam Ravnborg and I are
working on addressing that).
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch kills two printks from UDF that announce its registration and
unregistration. Since one can determine which filesystems are present by
examining /proc/filesystems, these messages strike me as noise.
Signed-off-by: Sean Neakums <sneakums@zork.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>