About Duffbert...

Duffbert's Random Musings is a weblog semi/sorta related to IBM/Lotus Notes & Domino software, but I don't let that be a limiting criteria. I'm Thomas Duff, and you can find out more about me here...

Email me!

Search This Site!

Google
Web
duffbert.com

Recent book reviews...

Idea Jam!

MiscLinks

Visitor Count...



View My Stats

« Book Review - The Moment It Clicks by Joe McNally | Main| It's humbling to put your code out there for others to see (subtitled: Just another day of frustration...) »

Boy, I haven't had that bad a day as a developer in a LONG time...

Category Software Development

We moved a new Notes app onto the server Friday, and we opened it up for the registration function this morning at around 7 am.  Within about 10 minutes, the CPU of the server was pegged at 100%.  Not good...

We modified a keyword document to close off the registration, dropped the users from the applications, and everything went back to normal (10% - 20% utilization).

Now, I would be OK with this if I could pinpoint a reason WHY this new app was going crazy.  Functionality-wise, it was working fine in development.  Obviously, I hadn't tested it under load of a large number of users starting to register.  I streamlined a few lookups, changed a few graphics to be image resources, dropped a couple of views that had some level of calculation in a column, and dropped a field (that I didn't need) that was being built as a text list upon saving a document.

We'll try again tomorrow morning with the new changes, as well as with the server and Notes admins monitoring things when we open it back up again.

I'd understand it if I had done some Java agent that was not releasing memory, or if I had agents stuck in a loop.  But there is no Java code in there, and there's only one agent scheduled to run once an hour (and I can tell in OpenLog it ran all of one second during our crisis).  I'm wondering if the @PickList function used to pick people for your "team" was having fits since the view of eligible people was being added to (new registrations) and deleted from (people chosen for a team) heavily during that initial registration.

Sigh...  I don't like being incompetent...

Comments

Gravatar Image1 - Did your admin friends tell you *which* Domino task was chewing up the CPU cycles? I'm having a hard time thinking of something that could peg the CPU in a standard forms-driven Notes client app, but it'd help to know exactly what the server was spending all the effort on...

Gravatar Image2 - nserver, from what I understand... That's what the server dude was seeing. Hopefully I'll be a bit more "together" tomorrow to ask and investigate a bit more. Something about knowing it was YOUR app that is bringing down the production server... :)

Gravatar Image3 - Great, nserver doesn't tell you much, that one covers a lot of ground. Must be some sort of infinite loop/race condition scenario. Either that, or you've stumbled across a Domino bug (all together now, "No! It can't be!")...

Gravatar Image4 - Your admin didn't turn off name caching in order to speed up the registrations did he? That would have pegged the CPU in a heartbeat.

Gravatar Image5 - @3... We'll find out this morning when we fire it back up. :)

@4... That's not the problem. It's not "registration" as in directory stuff. It's "registration" as in "set a field in a document that says you're now part of this event, and create a logsheet for them."

Gravatar Image6 - I hope you don't have an @Today in a view selection formula? When I started at my current job, they had a performance issue with a view that had an @Today formula. The tricky part was that the complaint occurred on a 3 month cycle(or something like that). The reason it was cyclical was that the view contained more docs when there were more docs(busy time of year).
I had one yesterday where a developer added the refresh property to the form. The users picked up on it immediatly even thought the delay was not real significant. Performance issues are tough to troubleshoot sometimes.

Gravatar Image7 - Hi, Curt... no @Today (fortunately!). I did do some more paring this morning, after the second 7 am test still failed. If/when I ever get this solved, I'll recap my ineptitude for all to learn from. :)

Gravatar Image8 - Don't fret too much. I've crashed my fair share of servers in my time. I just hope that the admins don't notice it's my app that's bringing the server to it's knees. Nothing like an infinite loop that sends an email for crashing servers in a matter of seconds.

Gravatar Image9 - Tom
Prob checked for this already but it has bitten me on the arse a couple of times.. an un-intentional LS recursive call, or a recursion that can't find a way out.
Steve

Post A Comment

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)