Andrew Morton [Tue, 20 Apr 2004 00:59:56 +0000 (17:59 -0700)]
[PATCH] MIPS update
From: Ralf Baechle <ralf@linux-mips.org>
- more work on resurrecting AMD Alchemy platforms
- cleanup of unnecessary <asm/pgalloc.h> inclusions
- update default config files
- cleanup 32-bit compat ioctl code
- support for Montum Jaguar ATX
- workarounds for early revs of the RM9000
- fixes for RM5000 and RM7000 cache handling
- add support for PMC-Sierra Yosemite eval board
- further cleanup and bugfixes for SGI IP27
- make LASAT and VR41xx build and work in 2.6
- improved SGI IP32 support
- plenty of small fixes
Andrew Morton [Tue, 20 Apr 2004 00:23:00 +0000 (17:23 -0700)]
[PATCH] MIPS: don't offer SERIAL_DZ on 64-bit DEC
From: Ralf Baechle <ralf@linux-mips.org>
Limit the DZ driver to MIPS32 as the supported hardware is only present in
R2k/R3k-based systems (unless someone sends Maciej a PMAC-A board for driver
development).
Andrew Morton [Tue, 20 Apr 2004 00:22:03 +0000 (17:22 -0700)]
[PATCH] fix madvise(MADV_DONTNEED) for nonlinear vmas
From: Hugh Dickins <hugh@veritas.com>
Jamie points out that madvise(MADV_DONTNEED) should unmap pages from a
nonlinear area in such a way that the nonlinear offsets are preserved if the
pages do turn out to be needed later after all, instead of reverting them to
linearity: needs to pass down a zap_details block.
(But this still leaves mincore unaware of nonlinear vmas: bigger job.)
Andrew Morton [Tue, 20 Apr 2004 00:21:51 +0000 (17:21 -0700)]
[PATCH] reiserfs use-after-free fix
From: Chris Mason <mason@suse.com>
reiserfs-delayed-work started using queue_delayed_work, but did not make sure
the timer was finished before it freed the work queue structs during unmount.
This leads to timer oopsen if you unmount at just the right time.
Roland McGrath [Tue, 20 Apr 2004 00:20:06 +0000 (17:20 -0700)]
[PATCH] fix for potential deadlock after posix-timers change
Ulrich has been working on the glibc code using posix-timers and
stressing it more now than it has before. He ran into an SMP deadlock
on process exit in the case there are pending queued signals from a
timer.
The deadlock arises because in the path through exit_itimers, the
tasklist_lock is already held (for writing). When a timer is being
deleted, sigqueue_free will try to take it (for reading) in the case
where that timer has a pending signal queued on somebody's queue. This
patch avoids the problem by making sure the queues are flushed before
calling exit_itimers, thus ensuring its code path won't try to take
tasklist_lock.
My message queue patch fixes the 64 bits -> 32 bits conversion of
siginfo, but didn't change the 32 -> 64 bits conversion done in
sys32_rt_sigqueueinfo() which was apparently bogus as well.
After much discussion & debate on the right way of converting that
structure, I decided to go the sparc64 / s390 way, and not the x86_64
way, that is to copy the various unions data "as is". This guarantees
that whatever a 32 bist app passes there, another 32 bits app will
understand it. Crossover between 32 and 64 bits apps on such things
as home-made userland siginfo isn't something we can help with anyway.
The x86_64 choice of converting as if it was an RT signal, thus
converting the sigval, cannot easily be applied to big endian archs
since the sigval is a union of a ptr and an int, on BE, the int
happens to be on the wrong half of the 64 bits ptr, thus we can't
do a simple conversion.
init_ide_data() initializes default IDE interfaces but without default IRQ
(hwif->irq and hwif->hw.irq fields) so introduce ide_init_default_irq() and
remove redundant ide_init_default_hwifs() (except arm26 and arm ones).
As a side-effect it fixes:
- CONFIG_BLK_DEV_HD_IDE if !CONFIG_BLK_DEV_IDEPCI (i386)
- hwif->noprobe shouldn't be 0 if !hwif->io_ports[IDE_DATA_OFFSET]
(alpha, i386, ia64, mips, sh, x86_64)
The following patch allows hw_random.c to build on ia64. (The problem
was just that the VIA stuff has i386 assembly in it. The current code
only probes for VIA on i386 anyway, so this patch just adds more ifdefs
so the VIA code is only built for i386.)
Pavel Roskin [Mon, 19 Apr 2004 09:04:06 +0000 (05:04 -0400)]
[PATCH] Tulip endianess fix
My tulip ethernet card doesn't work on Blue&White G3 PowerMac with Linux
2.6.5-rc2. The card is shown by lspci as
01:03.0 Ethernet controller: Linksys Network Everywhere Fast Ethernet
10/100 model NC100 (rev 11)
The kernel detects it as "ADMtek Comet rev 17".
The MAC address reported by the kernel looked obviously wrong. Also, I
could only ping the system successfully if the interface was in promiscuous
mode (running Ethereal).
Those two symptoms indicated two different problems - one for reading the
MAC address from the card on module load (tulip_init_one), and the other
for writing the address to the card when the interface was brought up
(tulip_up). I have fixed both, and here's the explanation:
tulip_init_one:
When reading the first 4 bytes of the address, inl() returns the same data
to the CPU on all platforms, interpreting the data from the lowest port
address as the least significant byte. In other words, I/O is little
endian on all platforms; it's the memory that differs across platforms.
We want to write the data to memory preserving little-endianness of the
PCI bus. To force little endian write to the memory, the data should be
converted to the little endian format.
When reading the remaining 2 bytes, the CPU gets them in 2 least
significant bytes. To write those 2 bytes to the memory in a 16-bit
operation, they should be byte-swapped for the 16-bit operation.
tulip_up:
The first 4 bytes are processed correctly, but the code is confusing.
Reading from memory needs conversion to CPU format, while writing to I/O
ports doesn't. So I replaced cpu_to_le32() to le32_to_cpu().
The second 2 bytes are read in a 16-bit memory operation, so they should
be passed to le16_to_cpu() rather than cpu_to_le32() to make them CPU
independent and suitable for outl().
All those conversions do nothing on little-endian machines, so they should
not be affected.
The patch has been tested. The driver is working fine. ping is OK, ssh
is OK, X11 over ssh is OK. Even netconsole is working fine.
Russell King [Mon, 19 Apr 2004 08:50:20 +0000 (04:50 -0400)]
[PATCH] fix arm/etherh.c
On Tue, Apr 13, 2004 at 02:35:40PM -0400, Jeff Garzik wrote:
> Russell,
>
> Would you be willing to provide an updated diff of this?
I didn't particularly like the PRIV() method implemented previously -
gcc appears to want to avoid some optimisations it if its an inline
function rather than a macro.
Also, 'ei_local' may look unused in some functions, but it's your
typical hidden-use-in-a-macro crap which 8390 likes.
In systems with mixed network cards, and all drivers compiled into
the kernel; the PCI device (eth0) will get probed first, before the ISA.
The problem is that the ISA device can mistakenly try to probe
for eth0. The problem is that the ISA driver will not detect the failure
until it goes to call register_netdevice, and not all drivers have
perfect error unwind code.
This patch short circuits the device probe, so it won't bother
looking for devices that already are registered.
Adrian Bunk [Mon, 19 Apr 2004 08:43:04 +0000 (04:43 -0400)]
[PATCH] fix warning in drivers/net/tulip/timer.c
I get the following warning in 2.6.5-mm6 and 2.6.6-rc1:
<-- snip -->
...
CC drivers/net/tulip/timer.o
drivers/net/tulip/timer.c: In function `comet_timer':
drivers/net/tulip/timer.c:156: warning: unused variable `ioaddr'
...
<-- snip -->
Since the
[netdrvr tulip] add MII support for Comet chips
patch has removed the only use of this variable, the fix is simple:
Chris Wright [Mon, 19 Apr 2004 08:26:30 +0000 (04:26 -0400)]
[PATCH] wan sdla: fix probable security hole
> [BUG] minor
> /home/kash/linux/linux-2.6.5/drivers/net/wan/sdla.c:1206:sdla_xfer:
> ERROR:TAINT: 1201:1206:Passing unbounded user value "(mem).len" as arg 0
> to function "kmalloc", which uses it unsafely in model
> [SOURCE_MODEL=(lib,copy_from_user,user,taintscalar)]
> [SINK_MODEL=(lib,kmalloc,user,trustingsink)] [MINOR] [PATH=] [Also
> used at, line 1219 in argument 0 to function "kmalloc"]
> static int sdla_xfer(struct net_device *dev, struct sdla_mem *info, int
> read)
> {
> struct sdla_mem mem;
> char *temp;
>
> Start --->
> if(copy_from_user(&mem, info, sizeof(mem)))
> return -EFAULT;
>
> if (read)
> {
> Error --->
> temp = kmalloc(mem.len, GFP_KERNEL);
> if (!temp)
> return(-ENOMEM);
> sdla_read(dev, mem.addr, temp, mem.len);
Hrm, I believe you could use this to read 128k of kernel memory.
sdla_read() takes len as a short, whereas mem.len is an int. So,
if mem.len == 0x20000, the allocation could still succeed. When cast
to short, len will be 0x0, causing the read loop to copy nothing into
the buffer. At least it's protected by a capable() check. I don't
know what proper upper bound is for this hardware, or how much it's
used/cared about. Simple memset() is trivial fix.
Andrew Morton [Mon, 19 Apr 2004 05:06:30 +0000 (22:06 -0700)]
[PATCH] From: David Gibson <david@gibson.dropbear.id.au>
hugepage_vma() is both misleadingly named and unnecessary. On most archs it
always returns NULL, and on IA64 the vma it returns is never used. The
function's real purpose is to determine whether the address it is passed is a
special hugepage address which must be looked up in hugepage pagetables,
rather than being looked up in the normal pagetables (which might have
specially marked hugepage PMDs or PTEs).
This patch kills off hugepage_vma() and folds the logic it really needs into
follow_huge_addr(). That now returns a (page *) if called on a special
hugepage address, and an error encoded with ERR_PTR otherwise. This also
requires tweaking the IA64 code to check that the hugepage PTE is present in
follow_huge_addr() - previously this was guaranteed, since it was only called
if the address was in an existing hugepage VMA, and hugepages are always
prefaulted.
Andrew Morton [Mon, 19 Apr 2004 05:06:16 +0000 (22:06 -0700)]
[PATCH] Fix default value for commit interval for older reiserfs filesystems.
From: Bart Samwel <bart@samwel.tk>
The reiserfs patch that adds support for "commit=0" saves the default max
commit age in a variable when the fs is originally mounted, so that it can
later restore it. Unfortunately it makes some mistakes with that:
- The default is not saved when the original mount has a commit=NNN option.
- The default is not correctly saved for older reiserfs filesystems, where
the default was not stored on disk.
Andrew Morton [Mon, 19 Apr 2004 05:05:51 +0000 (22:05 -0700)]
[PATCH] Increase number of dynamic inodes in procfs
From: Nathan Lynch <nathanl@austin.ibm.com>
On some larger ppc64 configurations /proc/device-tree is exhausting procfs'
dynamic (non-pid) inode range (16K). This patch makes the dynamic inode
range 0xf0000000-0xffffffff and changes the inode number allocator to use
the idr.c allocator for the first-fit allocations.
A few weeks ago, Pavel and I agreed that PF_IOTHREAD should be renamed to
PF_NOFREEZE. This reflects the fact that some threads so marked aren't
actually used for IO while suspending, but simply shouldn't be frozen.
This patch, against 2.6.5 vanilla, applies that change. In the
refrigerator calls, the actual value doesn't matter (so long as it's
non-zero) and it makes more sense to use PF_FREEZE so I've used that.
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the posix message queue syscalls to ppc32 and 64 and fixes
our implementation of compat copy siginfo to 32 bits userland which wasn't
using the si_code but still doing a switch/case on the signal number.
Russell King [Sun, 18 Apr 2004 23:32:07 +0000 (00:32 +0100)]
[ARM] Clean up ARM includes
This removes a number of unnecessary includes from the ARM specific
files throughout the kernel. Most notably asm/pgalloc.h is
needlessly included in several places. There were some places
including it as a means to get at the cache flushing functions,
so this has been corrected.
This is my brown paper bag day, I sent you the wrong patch for
fixing the deadlock in rtas.c, here's one to apply on top of current
bk that fixes build.
Reduce the locking coverage of the oft-used j_list_lock: the per-bh
jbd_lock_bh_state() gives us sufficient locking of buffer_head and
journal_head internals.
Andrew Morton [Sun, 18 Apr 2004 03:55:18 +0000 (20:55 -0700)]
[PATCH] rmap: nonlinear truncation
From: Hugh Dickins <hugh@veritas.com>
The earlier changes introducing PageAnon left truncated pages mapped into
nonlinear vmas unswappable. Once we go to object-based rmap, it's
impossible to find where file page is mapped once page->mapping cleared:
switching them to anonymous is odd, and breaks strict commit accounting.
So now handle truncation of nonlinear vmas correctly. And factor in
Daniel's cluster filesystem needs while we're there: when invalidating
local cache, we do want to unmap shared pages from all mms, but we do not
want to discard private COWed modifications of those pages (which
truncation discards to satisfy the SIGBUS semantics demanded by specs).
Drew from Daniel's patch (LKML 2 Mar 04), but didn't always follow it;
fewer name changes, but still some - "unmap" rather than "invalidate".
zap_page_range is not exported, safe to give it and all the too-many layers
an extra zap_details arg, in normal cases just NULL.
Given details, zap_pte_range checks page mapping or index to skip anon or
untruncated pages. I didn't realize before implementing, that in nonlinear
case, it should set a file pte when truncating - otherwise linear pages
might appear in place of SIGBUS. I suspect this implies that ->populate
functions ought to set file ptes beyond EOF instead of failing, but haven't
changed them as yet.
To avoid making yet another copy of that ugly linear pgidx test, added
inline function linear_page_index (to pagemap.h to get PAGE_CACHE_SIZE,
though as usual things don't really work if it differs from PAGE_SIZE).
Ooh, I thought I'd removed ___add_to_page_cache last time, do so now.
unmap_page_range static, shift its hugepage check up into sole caller
unmap_vmas. Killed "killme" debug from unmap_vmas, not seen it trigger.
unmap_mapping_range is exported without restriction: I'm one of those who
believe it should be generally available. But I'm wrongly placed to decide
that, probably just sob quietly to myself if _GPL added later.
Andrew Morton [Sun, 18 Apr 2004 03:55:06 +0000 (20:55 -0700)]
[PATCH] rmap: swap_unplug page
From: Hugh Dickins <hugh@veritas.com>
Good example of "swapper_space considered harmful": swap_unplug_io_fn was
originally designed for calling via swapper_space.backing_dev_info; but
that way it loses track of which device is to be unplugged, so had to
unplug all swap devices. But now sync_page tests SwapCache anyway, can
call swap_unplug_io_fn with page, which leads direct to the device.
Reverted -mc4's CONFIG_SWAP=n fix, just add another NOTHING for it.
Reverted -mc3's editorial adjustments to swap_backing_dev_info and
swapper_space initializations: they document the few fields which are
actually used now, as comment above them says (sound of slapped wrist).
Andrew Morton [Sun, 18 Apr 2004 03:54:52 +0000 (20:54 -0700)]
[PATCH] rmap: flush_dcache revisited
From: Hugh Dickins <hugh@veritas.com>
One of the callers of flush_dcache_page is do_generic_mapping_read, where
file is read without i_sem and without page lock: concurrent truncation may
at any moment remove page from cache, NULLing ->mapping, making
flush_dcache_page liable to oops. Put result of page_mapping in a local
variable and apply mapping_mapped to that (if we were to check for NULL
within mapping_mapped, it's unclear whether to say yes or no).
parisc and arm do have other locking unsafety in their i_mmap(_shared)
searching, but that's a larger issue to be dealt with down the line.
Andrew Morton [Sun, 18 Apr 2004 03:54:27 +0000 (20:54 -0700)]
[PATCH] Fix unix module
From: Rusty Russell <rusty@rustcorp.com.au>
# lsmod
Module Size Used by
1 26060 6
#
The compiler #define's unix to 1: we use -DKBUILD_MODNAME=unix. We used to
#undef unix at the top of af_unix.c, but now the name is inserted by
modpost, that doesn't help.
Andrew Morton [Sun, 18 Apr 2004 03:54:15 +0000 (20:54 -0700)]
[PATCH] ppc64: Fix CPU hot unplug deadlock
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
My RTAS locking fixes incorrectly added a spinlock around the function used
to stop a CPU, that function never returns, thus the lock becomes stale.
The correct fix is to disable interrupts instead (the RTAS params beeing
per-CPU, this should be safe enough)
Russell King [Sat, 17 Apr 2004 23:19:03 +0000 (00:19 +0100)]
[ARM] Add detailed documentation concerning ARM page tables
This adds detailed documentation concerning how we map the Linux
page table structure onto the hardware tables on ARM. In addition,
it also adds documentation describing how we emulate the "dirty"
and "young" or "accessed" page table bits.
This should be of interest to Linux MM developers.
It occurred to me that if vma and new_vma are one and the same, then
vma_relink_file will not do a good job of linking it after itself - in
that pretty unlikely case when move_page_tables fails.
And more generally, whenever copy_vma's vma_merge succeeds, we have no
guarantee that old vma comes before new_vma in the i_mmap lists, as we
need to satisfy Rajesh's point: that ordering is only guaranteed in the
newly allocated case.
We have to abandon the ordering method when/if we move from lists to
prio_trees, so this patch switches to the less glamorous use of
i_shared_sem exclusion, as in my prio_tree mremap.
Pavel Roskin [Sat, 17 Apr 2004 10:41:18 +0000 (11:41 +0100)]
[PCMCIA] Conversion to module_param
Patch from: Pavel Roskin
As it turns out, mixing MODULE_PARM and module_param in one module is
wrong. The parameters specified in module_param are ignored. I've just
posted a patch to LKML that will detect this condition and warn about it.
The new debugging code used the new-style module_param, which means that
all instances of MODULE_PARM should be converted. The attached patch does
that.
An additional bonus is that module_param_array provides the number of
array elements. This allowed me to change tcic.c and i82365.c to use
this number for IRQ list. This change was tested with i82365. If
"irq_list" is not specified, irq_list_count is 0.
I set all permissions to 0444 to be safe. I think we have no secrets
from the users regarding those parameters. If some parameters can be
changed safely at the runtime, the permissions could be changed to 0644.
I didn't examine how safe (and how useful) it would be, so it's 0444 for
now.
Andrew Morton [Sat, 17 Apr 2004 10:32:46 +0000 (03:32 -0700)]
[PATCH] ARM-related ptep_to_address() fix
From: William Lee Irwin III <wli@holomorphy.com>
rmk mentioned that ARM was borked as the relation, assumed by generic rmap,
PTRS_PER_PTE*sizeof(pte_t) == PAGE_SIZE, fails to hold. The following
patch, developed jointly with him (or depending on POV, by him with me
acting as codemonkey), is reported to resolve the issue.
Specifically, while ARM dedicates an entire PAGE_SIZE -sized block of
memory to each PTE table, the PTE table itself only spans half that, the
remainder being dedicated to hardware-interpreted structures. As the
hardware structure must be contiguous, wider ptes can't be used. So the
core-visible PTE table only spans PAGE_SIZE/2 bytes, violating the
assumption. This corrects masking and scaling done in ptep_to_address().
Andrew Morton [Sat, 17 Apr 2004 10:29:12 +0000 (03:29 -0700)]
[PATCH] Fix bogus get_page() calls in hugepage code
From: David Gibson <david@gibson.dropbear.id.au>
Some versions of follow_huge_addr() and follow_huge_pmd() are doing a
get_page() on the target page. They shouldn't: follow_page() returns an
unpinned page and it is the caller's responsibility to pin the page (if
desired) before dropping page_table_lock.