[NCLUG] system hangs for 5-10 seconds every few minutes

Sat Sep 1 21:23:40 MDT 2007

Daniel Herrington wrote:
> I'm using ssh over a usb wireless 802.11b adapter.

Hmm...  What adaptor is it?  I have heard of people having random
trouble with some NICs.  Unfortunately I am worthless at reproducing
that particular information.

  lspci | grep Ethernet

> The embedded machine has 128MB of memory.

Small, but should be quite sufficient.  You really should not be
seeing ram based performance problems.  My main firewall at this
moment is a Compaq Deskpro 2000 Pentium 133MHz with 32MB of main
memory and 500MB of disk.  :-)

> I tried vmstat, and it shows 0 for both si and so.

Good.  Very unlikely to be a swapping issue then.  At least it is not
showing any swapping activity.  Just to put a nail in the coffin does
'free' show any swap being used?

> The top program is also pausing, and when I tried it with  
> "top -b -d 1 | tee top.txt", the top.txt never showed more than about  
> 20% cpu usage by the user, and never more than about 11% by the  
> system.

This is where you need a linux kernel hacker to walk you through the
alt-sysrq interface.  When the kernel becomes unresponsive this may
provide more information.

  http://www.linuxhowtos.org/Tips%20and%20Tricks/sysrq.htm

But you are connecting over the network, no sysrq key available, and
that makes things more difficult.

> A couple of processes that I'm wondering about are kjournald  

That would be the kernel journal daemon and I suspect you have one
associated with each ext3 filesystem.

> and NetworkManager. Maybe these are too resource-intensive for my  
> measley Pentium 200 MMX single-board computer.

NetworkManager has more baggage associated with it but if you are not
swapping then I don't think that should be the issue.

> It would be nice if I could see how much cpu time all the processes  
> on the system took over a given period of time.

I am thinking that some kernel driver is blocking for too long and is
timing out in some way before returning to normal timesharing.  This
would be easier with a real console because then you could unload
almost all of the drivers including the wireless driver and then see
if the problem persists.  If not then add them back in one by one and
see when the problem starts up.

Anything in the syslog?

What is the current console logging level?  Perhaps turning on full
kernel messages?

  dmesg -n8

  # The dmesg -nNUMBER sets the level at which the kernel will log
  # messages to the console.  The level is less than the number.
  # To avoid logging KERN_INFO 6 or lower must be used.
  #
  #    #define KERN_EMERG   "<0>" /* system is unusable                    */
  #    #define KERN_ALERT   "<1>" /* action must be taken immediately      */
  #    #define KERN_CRIT    "<2>" /* critical conditions                   */
  #    #define KERN_ERR     "<3>" /* error conditions                      */
  #    #define KERN_WARNING "<4>" /* warning conditions                    */
  #    #define KERN_NOTICE  "<5>" /* normal but significant condition      */
  #    #define KERN_INFO    "<6>" /* informational                         */
  #    #define KERN_DEBUG   "<7>" /* debug-level messages                  */
  #
  # The kernel default is 8 so that all messages are logged to the console.
  # At least one other distro sets this to 3 in /etc/syscontrol/init
  # and so users there never see console messages.

Sorry, these are not very good suggestions.

Bob