Documentation/must-fix.txt |  354 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 354 insertions(+)

diff -puN /dev/null Documentation/must-fix.txt
--- /dev/null	2002-08-30 16:31:37.000000000 -0700
+++ 25-akpm/Documentation/must-fix.txt	2003-10-03 02:14:46.000000000 -0700
@@ -0,0 +1,354 @@
+
+Must-fix bugs
+=============
+
+drivers/char/
+~~~~~~~~~~~~~
+
+o TTY locking is broken.
+
+  o see FIXME in do_tty_hangup().  This causes ppp BUGs in local_bh_enable()
+
+  o Other problems: aviro, dipankar, Alan have details.
+
+  o somebody will have to document the tty driver and ldisc API
+
+o Lack of test cases and/or stress tests is a problem.  Contributions and
+  suggestions are sought.
+
+o Lots of drivers are using cli/sti and are broken.
+
+drivers/tty
+~~~~~~~~~~~
+
+o viro: we need to fix refcounting for tty_driver (oopsable race, must fix
+  anyway, hopefully about a week until it's merged) then we can do
+  tty/misc/upper levels of sound.
+
+drivers/block/
+~~~~~~~~~~~~~~
+
+o ideraid hasn't been ported to 2.5 at all yet.
+
+  We need to understand whether the proposed BIO split code will suffice
+  for this.
+
+o CD burning.  There are still a few quirks to solve wrt SG_IO and ide-cd.
+
+  Jens: The basic hang has been solved (double fault in ide-cd), there still
+  seems to be some cases that don't work too well.  Don't really have a
+  handle on those :/
+
+o lmb: Last time I looked at the multipath code (2.5.50 or so) it also
+  looked pretty broken; I plan to port forward the changes we did on 2.4
+  before KS.
+
+drivers/input/
+~~~~~~~~~~~~~~
+
+o rmk: unconverted keyboard/mouse drivers (there's a deadline of 2.6.0
+  currently on these remaining in my/Linus' tree.)
+
+o viro: large absence of locking.
+
+o viro: parport is nearly as bad as that and there the code is more hairy. 
+  IMO parport is more of "figure out what API changes are needed for its
+  users, get them done ASAP, then fix generic layer at leisure"
+
+o (Albert Cahalan) Lots of people (check Google) get this message from the
+  kernel:
+
+  psmouse.c: Lost synchronization, throwing 2 bytes away.
+
+  (the number of bytes will be 1, 2, or 3)
+
+  At work, I get it when there is heavy NFS traffic.  The mouse goes crazy,
+  jumping around and doing random cut-and-paste all over everything.  This
+  is with a decently fast and modern PC.
+
+o There seem to be too many reports of keyboards and mice failing or acting
+  strangely.
+
+
+drivers/misc/
+~~~~~~~~~~~~~
+
+o rmk: UCB1[23]00 drivers, currently sitting in drivers/misc in the ARM
+  tree.  (touchscreen, audio, gpio, type device.)
+
+  These need to be moved out of drivers/misc/ and into real places
+
+o viro: actually, misc.c has a good chance to die.  With cdev-cidr that's
+  trivial.
+
+drivers/net/
+~~~~~~~~~~~~
+
+o rmk: network drivers.  ARM people like to add tonnes of #ifdefs into
+  these to customise them to their hardware platform (eg, chip access
+  methods, addresses, etc.) I cope with this by not integrating them into my
+  tree.  The result is that many ARM platforms can't be built from even my
+  tree without extra patches.  This isn't sane, and has bred a culture of
+  network drivers not being submitted.  I don't see this changing for 2.6
+  though.
+
+drivers/net/irda/
+~~~~~~~~~~~~~~~~~
+
+o dongle drivers need to be converted to sir-dev
+
+o irport need to be converted to sir-kthread
+
+o new drivers (irtty-sir/smsc-ircc2/donauboe) need more testing
+
+o rmk: Refuse IrDA initialisation if sizeof(structures) is incorrect (I'm
+  not sure if we still need this; I think gcc 2.95.3 on ARM shows this
+  problem though.)
+
+drivers/pci/
+~~~~~~~~~~~~
+
+o alan: Some cardbus crashes the system
+
+  (bugzilla, please?)
+
+drivers/pcmcia/
+~~~~~~~~~~~~~~~
+
+o alan: This is a locking disaster.
+
+  (rmk, brodo: in progress)
+
+drivers/pld/
+~~~~~~~~~~~~
+
+o rmk: EPXA (ARM platform) PLD hotswap drivers (drivers/pld)
+
+  (rmk: will work out what to do here.  maybe drivers/arm/)
+
+drivers/video/
+~~~~~~~~~~~~~~
+
+o Lots of drivers don't compile, others do but don't work.
+
+drivers/scsi/
+~~~~~~~~~~~~~
+
+o hch: large parts of the locking are hosed or not existant
+
+  (Mike Anderson, Patrick Mansfield, Badari Pulavarty)
+
+  o shost->my_devices isn't locked down at all
+
+  o there are lots of members of struct Scsi_Host/scsi_device/scsi_cmnd
+    with very unclear locking, many of them probably want to become
+    atomic_t's or bitmaps (for the 1bit bitfields).
+
+  o there's lots of volatile abuse in the scsi code that needs to be
+    thought about.
+
+  o there's some global variables incremented without any locks
+
+o Convert am53c974, dpt_i2o, initio and pci2220i to DMA-mapping
+
+o Make inia100, cpqfc, pci2000 and dc390t compile
+
+o Convert
+
+   wd33c99 based: a2091 a3000 gpv11 mvme174 sgiwd93
+
+   53c7xx based: amiga7xxx bvme6000 mvme16x initio am53c974 pci2000
+   pci2220i dc390t
+
+  To new error handling
+
+  It also might be possible to shift the 53c7xx based drivers over to
+  53c700 which does the new EH stuff, but I don't have the hardware to check
+  such a shift.
+
+  For the non-compiling stuff, I've probably missed a few that just aren't
+  compilable on my platforms, so any updates would be welcome.  Also, are
+  some of our non-compiling or unconverted drivers obsolete?
+
+o rmk: I have a pending todo: I need to put the scsi error handling through
+  a workout on my scsi bus from hell to make sure it does the right thing
+  and doesn't get wedged.
+
+o James B: USB hot-removal crash: "It's a known scsi refcounting issue."
+
+fs/
+~~~
+
+o AIO/direct-IO writes can race with truncate and wreck filesystems.
+  (Badari has a patch)
+
+o hch: devfs: there's a fundamental lookup vs devfsd race that's only
+  fixable by introducing a lookup vs devfs deadlock.  I can't see how this is
+  fixable without getting rid of the current devfsd design.  Mandrake seems
+  to have a workaround for this so this is at least not triggered so easily,
+  but that's not what I'd consider a fix..
+
+o viro: fs/char_dev.c needs removal of aeb stuff and merge of cdev-cidr. 
+  In progress.
+
+o forward-port sct's O_DIRECT fixes (Badari has a patch)
+
+o viro: there is some generic stuff for namei/namespace/super, but that's a
+  slow-merge and can go in 2.6 just fine
+
+o andi: also soft needs to be fixed - there are quite a lot of
+  uninterruptible waits in sunrpc/nfs
+
+o trond: NFS has a mmap-versus-truncate problem
+
+kernel/sched.c
+~~~~~~~~~~~~~~
+
+o Starvation, general interactivity need close monitoring.
+
+kernel/
+~~~~~~~
+
+o Alan: 32bit uid support is *still* broken for process accounting.
+
+  Create a 32bit uid, turn accounting on.  Shock horror it doesn't work
+  because the field is 16bit.  We need an acct structure flag day for 2.6
+  IMHO
+
+  (alan has patch)
+
+o viro: core sysctl code is racy.  And its interaction wiuth sysfs
+
+o (ingo) rwsems (on x86) are limited to 32766 waiting processes.  This
+  means that setting pid_max to above 32K is unsafe :-(
+
+  An option is to use CONFIG_RWSEM_GENERIC_SPINLOCK variant all the time,
+  for all archs, and not inline any part of the ops.
+
+lib/kobject.c
+~~~~~~~~~~~~~
+
+o kobject refcounting (comments from Al Viro):
+
+  _anything_ can grab a temporary reference to kobject.  IOW, if kobject is
+  embedded into something that could be freed - it _MUST_ have a destructor
+  and that destructor _MUST_ be the destructor for containing object.
+
+  Any violation of the above (and we already have a bunch of those) is a
+  user-triggerable memory corruption.
+
+  We can tolerate it for a while in 2.5 (e.g.  during work on susbsystem we
+  can decide to switch to that way of handling objects and have subsystem
+  vulnerable for a while), but all such windows must be closed before 2.6
+  and during 2.6 we can't open them at all.
+
+o All block drivers which control multiple gendisks with a single
+  request_queue are broken, due to one-to-one assumptions in the request
+  queue sysfs hookup.
+
+mm/
+~~~
+
+o GFP_DMA32 (or something like that).  Lots of ideas.  jejb, zaitcev,
+  willy, arjan, wli.
+
+  Specifically, 64-bit systems need to be able to enforce 32-bit addressing
+  limits for device metadata like network cards' ring buffers and SCSI
+  command descriptors.
+
+o access_process_vm() doesn't flush right.  We probably need new flushing
+  primitives to do this (davem?)
+
+
+modules
+~~~~~~~
+
+  (Rusty)
+
+net/
+~~~~
+
+  (davem)
+
+o UDP apps can in theory deadlock, because the ip_append_data path can end
+  up sleeping while the socket lock is held.
+
+  It is OK to sleep with the socket held held, normally.  But in this case
+  the sleep happens while waiting for socket memory/space to become
+  available, if another context needs to take the socket lock to free up the
+  space we could hang.
+
+  I sent a rough patch on how to fix this to Alexey, and he is analyzing
+  the situation.  I expect a final fix from him next week or so.
+
+o Semantics for IPSEC during operations such as TCP connect suck currently.
+
+  When we first try to connect to a destination, we may need to ask the
+  IPSEC key management daemon to resolve the IPSEC routes for us.  For the
+  purposes of what the kernel needs to do, you can think of it like ARP.  We
+  can't send the packet out properly until we resolve the path.
+
+  What happens now for IPSEC is basically this:
+
+  O_NONBLOCK: returns -EAGAIN over and over until route is resolved
+
+  !O_NONBLOCK: Sleeps until route is resolved
+
+  These semantics are total crap.  The solution, which Alexey is working
+  on, is to allow incomplete routes to exist.  These "incomplete" routes
+  merely put the packet onto a "resolution queue", and once the key manager
+  does it's thing we finish the output of the packet.  This is precisely how
+  ARP works.
+
+  I don't know when Alexey will be done with this.
+
+net/*/netfilter/
+~~~~~~~~~~~~~~~~
+
+  (Rusty)
+
+o Rework conntrack hashing.
+
+o Module relationship bogosity fix (trivial, have patch).
+
+sound/
+~~~~~~
+
+o rmk: several OSS drivers for SA11xx-based hardware in need of
+  ALSA-ification and L3 bus support code for these.
+
+o rmk: linux/sound/drivers/mpu401/mpu401.c and
+  linux/sound/drivers/virmidi.c complained about 'errno' at some time in the
+  past, need to confirm whether this is still a problem.
+
+o rmk: need to complete ALSA-ification of the WaveArtist driver for both
+  NetWinder and other stuff (there's some fairly fundamental differences in
+  the way the mixer needs to be handled for the NetWinder.)
+
+
+  (Issues with forward-porting 2.4 bugfixes.)
+  (Killing off OSS is 2.7 material)
+
+
+global
+~~~~~~
+
+o 64-bit dev_t.  Seems almost ready, but it's not really known how much
+  work is still to do.  Patches exist in -mm but with the recent rise of the
+  neo-viro I'm not sure where things are at.
+
+o Lots of 2.4 fixes including some security are not in 2.5
+
+o There are about 60 or 70 security related checks that need doing
+  (copy_user etc) from Stanford tools.  (badari is looking into this, and
+  hollisb)
+
+o A couple of hundred real looking bugzilla bugs
+
+o viro: cdev rework.  Main group is pretty stable and I hope to feed it to
+  Linus RSN.  That's cdev-cidr and ->i_cdev/->i_cindex stuff
+
+o Athlon prefetch oopses sometimes.  It is currently disabled, and needs to
+  be fixed.
+
+

_