Cutting out the e-mail server overnight is about as painful as cutting it out over lunch.
So last week we made a small glitch at about 6 pm Monday night, which resulted in KE tempfailing e-mail all night. Starting at 7:30 the next morning, I pumped queued messages through it from BARIS and kept the system running at a sustained load average of over 25 until lunch, when things pretty much started to calm down. The afternoon was slightly squirrely, but not bad.
Today, “preventative maintenance” on the UPS took out PAX and MIR (and thus all DNS service) for about half an hour starting just before noon. The recovery time for that was about the same as it was for last week’s snafu, and I managed to keep the load average greater than 35 for most of that time. Last week I had sendmail’s queue/refuse load averages set to 30 and 60; today I upped them to 50 and 100. Kind of painful on the IMAP and POP processes, but not at all unusable.
What this tells me (and something I already knew) is that e-mail usage peaks around lunchtime around here. I can look at the Cricket graphs of normal load average, number of messages processed, etc., but this is something visceral that a sysadmin can sink his teeth into (though he’d rather not). It also tells me that the virus scanning we’re using is just a touch suboptimal, but again, I already knew that. Lump spam scanning in there as well. But that’s why we’re looking to offload that onto another system.
Posted by Rowan Littell at September 21, 2004 08:06 PM