Add irq_enter/exit to smp_call_function_interrupt():
arch/i386/kernel/microcode.c:do_microcode_update() calls
smp_call_function(do_update_one). do_update_one() does
spin_lock/unlock.
Remove unneeded GET_THREAD_INFO(%ebx) in device_not_available() trap in
entry.S
Merge penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/tls-tree
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
what is TLS? Thread Local Storage is a concept used by threading
abstractions - fast an efficient way to store per-thread local (but not
on-stack local) data. The __thread extension is already supported by gcc.
proper TLS support in compilers (and glibc/pthreads) is a bit problematic
on the x86 platform. There's only 8 general purpose registers available,
so on x86 we have to use segments to access the TLS. The approach used by
glibc so far was to set up a per-thread LDT entry to describe the TLS.
Besides the generic unrobustness of LDTs, this also introduced a limit:
the maximum number of LDT entries is 8192, so the maximum number of
threads per application is 8192.
this patch does it differently - the kernel keeps a specific per-thread
GDT entry that can be set up and modified by each thread:
asmlinkage int sys_set_thread_area(unsigned int base,
unsigned int limit, unsigned int flags)
the kernel, upon context-switch, modifies this GDT entry to match that of
the thread's TLS setting. This way user-space threaded code can access
per-thread data via this descriptor - by using the same, constant %gs (or
%gs) selector. The number of TLS areas is unlimited, and there is no
additional allocation overhead associated with TLS support.
the biggest problem preventing the introduction of this concept was
Linux's global shared GDT on SMP systems. The patch fixes this by
implementing a per-CPU GDT, which is also a nice context-switch speedup,
2-task lat_ctx context-switching got faster by about 5% on a dual Celeron
testbox. [ Could it be that a shared GDT is fundamentally suboptimal on
SMP? perhaps updating the 'accessed' bit in the DS/CS descriptors causes
some sort locked memory cycle overhead? ]
the GDT layout got simplified:
* 0 - null
* 1 - Thread-Local Storage (TLS) segment
* 2 - kernel code segment
* 3 - kernel data segment
* 4 - user code segment <==== new cacheline
* 5 - user data segment
* 6 - TSS
* 7 - LDT
* 8 - APM BIOS support <==== new cacheline
* 9 - APM BIOS support
* 10 - APM BIOS support
* 11 - APM BIOS support
* 12 - PNPBIOS support <==== new cacheline
* 13 - PNPBIOS support
* 14 - PNPBIOS support
* 15 - PNPBIOS support
* 16 - PNPBIOS support <==== new cacheline
* 17 - not used
* 18 - not used
* 19 - not used
set_thread_area() currently recognizes the following flags:
- in theory we could avoid the 'limit in pages' bit, but i wanted to
preserve the flexibility to potentially enable the setting of
byte-granularity stack segments for example. And unlimited segments
(granularity = pages, limit = 0xfffff) might have a performance
advantage on some CPUs. We could also automatically figure out the best
possible granularity for a given limit - but i wanted to avoid this kind
of guesswork. Some CPUs might have a plus for page-limit segments - who
knows.
- The 'writable' flag is straightforward and could be useful to some
applications.
- The 'clear' flag clears the TLS. [note that a base 0 limit 0 TLS is in
fact legal, it's a single-byte segment at address 0.]
(the system-call does not expose any other segment options to user-space,
priviledge level is 3, the segment is 32-bit, etc. - it's using safe and
sane defaults.)
NOTE: the interface does not allow the changing of the TLS of another
thread on purpose - that would just complicate the interface (and
implementation) unnecesserily. Is there any good reason to allow the
setting of another thread's TLS?
NOTE2: non-pthreads glibc applications can call set_thread_area() to set
up a GDT entry just below the end of stack. We could use some sort of
default TLS area as well, but that would hard-code a given segment.
A much needed (and widely tested) ACPI bugfix for kernel 2.5.28:
An u8 was casted into an u32, then all 32 bits were zeroed. This can cause
other values, e.g. "unsigned long flags" to be corrupted. When these
flags==0 are "restored", the system locks hard.
Patrick Mochel [Thu, 25 Jul 2002 05:12:04 +0000 (22:12 -0700)]
driverfs: Don't use VFS for file or directory deletion
These are tied together a bit, so they're included in the same patch
Mainly, they move the taking of the inode's i_sem into the unlink and rmdir.
driverfs_rmdir doesn't call driverfs_unlink anymore, as it checks if the directory is empty
and conditionally does d_delete on it.
fs/namei.c implements d_unhash, which is called in vfs_rmdir. This isn't exported (yet),
so reimplement it here (at least until it's known that it's not needed or it's exported).
Patrick Mochel [Thu, 25 Jul 2002 04:24:54 +0000 (21:24 -0700)]
driverfs: don't use vfs for creating symlinks
Add check for existence of dentry in driverfs_symlink and driverfs_mknod (which the other creation
functions use).
Patrick Mochel [Thu, 25 Jul 2002 04:13:01 +0000 (21:13 -0700)]
driverfs: stop using vfs layer for file creation
This is the first of a series of patches to driverfs to _not_ use the vfs layer for file creation
and deletion.
The VFS layer is allowing files and directories to be removed from userspace, which we don't want
at all.
Per Al Viro's suggesting, I am pushing the necessary checks from the vfs_* functions into the
driverfs functions, and calling them directly from the kernel interface to driverfs.
Sam Ravnborg [Thu, 25 Jul 2002 02:34:53 +0000 (19:34 -0700)]
[PATCH] docbook: Call docbook makefile with -f [9/9]
The rewritten makefile for DocBook requires that working directory
is $(TOPDIR) therefore use -f Documentation/DocBook/Makefile to
invoke the docbook makefile.
Sam Ravnborg [Thu, 25 Jul 2002 02:34:15 +0000 (19:34 -0700)]
[PATCH] docbook: Update documentation to reflect new docproc [7/9]
kernel-doc-nano-HOWTO.txt updated to reflect new functionality
provided by docproc.
gen-all-syms and docgen description removed.
kernel-api.tmpl and parportbook.tmpl updated to specify files to search
for EXPORT-SYMBOL* to enable documentation of all relevant functions.
Sam Ravnborg [Thu, 25 Jul 2002 02:33:55 +0000 (19:33 -0700)]
[PATCH] docbook: Makefile cleanup [6/9]
Massive cleanup of makefile.
Comments added as well.
Enabled by the new functionality provided by docproc
When generating HTML locate a new file in DocBook dir that points to
the book in question.
Sam Ravnborg [Thu, 25 Jul 2002 02:33:34 +0000 (19:33 -0700)]
[PATCH] docbook: scripts/docproc improved [5/9]
This is the first patch in a serie to clean-up the DocBook
Makefile.
docproc is extented to include the functionality previously provided by
gen-all-syms and docgen. Furthermore the necessity to specify which
files to search for EXPORT_SYMBOL are removed, the information is now
read in the .tmpl files.
docproc is furthermore extended to generate dependency information.
gen-all-syms and docgen are deleted.
Sam Ravnborg [Thu, 25 Jul 2002 02:33:15 +0000 (19:33 -0700)]
[PATCH] kernel-doc: Fix warnings [4/9]
During processing of skbuff.h three warnings were issued,
because members of an enum within a struct were nor documented.
This patch fixes kernel-doc not to spit out these non-valid warnings.
Originally by Thunder.
Petr Vandrovec [Thu, 25 Jul 2002 02:27:25 +0000 (19:27 -0700)]
[PATCH] ipx use of cli/sti
This removes cli/sti from SPX registration code in IPX. I decided to
use normal rw_semaphore instead of net_family_{write,read}_{lock,unlock}
used in net/socket.c.
I left SPX code itself alone: I do not use it and last time I checked it
it was very unreliable reliable transport.
This patch for 2.5.28 reduces the stack frame size of
arch/i386/kernel/nmi.c:check_nmi_watchdog() from 4096 bytes
in the worst case to 128 bytes.
The problem with the current code is that it copies the entire
irq_stat[] array, when only a single field (__nmi_count) is of
interest. The irq_stat_t element type is only 28 bytes, but it
is also ____cacheline_aligned, and that blows the array up to
4096 bytes on SMP P4 Xeons, 2048 bytes on SMP K7s, and 1024 bytes
on SMP P5/P6s. The patch reduces this to NR_CPUS*4==128 bytes.
[PATCH] Ensure xtime_lock and timerlist_lock are on difft cachelines
I've noticed that xtime_lock and timerlist_lock ends up on the same
cacheline all the time (atleaset on x86). Not a good thing for
loads with high xxx_timer and do_gettimeofday counts I guess (networking etc).
Richard Russon [Thu, 25 Jul 2002 02:14:52 +0000 (19:14 -0700)]
[PATCH] New LDM Driver (Windows Dynamic Disks)
This is a complete rewrite of the LDM driver (support for Windows
Dynamic Disks). It incorporates Al Viro's recent partition handling
tidy ups.
Details:
LDM Driver rewritten. More efficient. Much smaller memory footprint.
The old driver was little more than a stopgap.
The new driver is a complete rewrite
based on a much better understanding of the database
based on much more reverse engineering
more able to spot errors and inconsistancies
it has a much smaller memory footprint
no longer considered experimental
accompanied by brief info: Documentation/ldm.txt
Anton Blanchard [Thu, 25 Jul 2002 01:54:01 +0000 (18:54 -0700)]
[PATCH] Missing memory barrier in pte_chain_unlock
On a ppc64 machine running 2.5.28 we were hitting this BUG in
__free_pages_ok:
BUG_ON(page->pte.chain != NULL);
In pte_chain_lock we use test_and_set_bit which implies a memory
barrier. In pte_chain_unlock we use clear_bit which has no memory
barriers so we need to add one.
Patrick Mochel [Thu, 25 Jul 2002 01:50:07 +0000 (18:50 -0700)]
Remove BKL from driverfs
- in mkdir: we already hold parent directory's semaphore (c.f. driverfs_create_dir)
- in create: ditto (c.f. driverfs_create_file)
- in unlink: ditto (c.f. driverfs_remove_file) and file's i_sem is taken in vfs_unlink
- un lseek: take inode's i_sem (though I think we can replace this with a common lseek function...later)
Add EVIOCSABS() ioctl to change the abs* informative
values on input devices. This is something the X peoople
really wanted.
Rename input_devinfo to input_id, it's shorter and more
to the point.
Remove superfluous printks in uinput.c
Clean up return values in evdev.c ioctl.
Because the Linux Input core follows the USB HID standard where it
comes to directions of movement and rotation, a mouse wheel should
be positive where it "rotates forward, away from the user". We had
the opposite in psmouse.c. Fixed this.
By popular request, and explicit method of telling which events
from a device belong together was implemented - input_sync() and
EV_SYN. Touches every input driver. The first to make use of it
is mousedev.c to properly merge events into PS/2 packets.
This patch adds two new serio input drivers. Both are "UART" type
drivers for PS/2 ports on both StrongARM and ARM Integrator hardware.
-- Russell King
The following patch adds the "resend" capability to the keyboard driver;
when the host driver detects a parity or framing error, we can ask the
keyboard to resend the data, instead of treating random garbage as
valid data.
We also export serio_interrupt() - serio modules are using it, yet you
can't use them as modules without this symbol exported.
-- Russell King
After some grepping and talking to maintainers, I did the appended cleanup
patch. This should be it from me until char/keyboard.c becomes a real input
layer client, but this final patch will be _very_ small now :-)).
-- Franz Sirl
Add an i8042_restore_ctr command line option. This allows not
restoring the CTR value after an AUX write by default, which
breaks Transmeta Crusoe i8042 chip emulation. The option might
be needed on some ancient hardware, though.
Move registration of the KBD interface after the AUX probe. This
makes sure we don't kill the keyboard by probing for the AUX port.
Fix a bug in i8042.c which only enables the interfaces after the
probe routine for mice/keyboards was executed.
Add some paranoia flush/sync calls to make sure the chip doesn't get
stuck.
Don't check_region, we already request_region where applicable.
Do i8042_reset on all non-PC architectures.
Cleanup comments.
Add an i8042_restore_ctr command line option. This allows not
restoring the CTR value after an AUX write by default, which
breaks Transmeta Crusoe i8042 chip emulation. The option might
be needed on some ancient hardware, though.
Move registration of the KBD interface after the AUX probe. This
makes sure we don't kill the keyboard by probing for the AUX port.
Fix a bug in i8042.c which only enables the interfaces after the
probe routine for mice/keyboards was executed.
Add some paranoia flush/sync calls to make sure the chip doesn't get
stuck.
Don't check_region, we already request_region where applicable.
Do i8042_reset on all non-PC architectures.
Cleanup comments.