Listening for signals from dead Zopes
This phenomenon is commonly known as a deadlocked Zope. This is a bit of a misnomer since you might be dealing with infinite loops instead of a deadlock.
There are two popular tools to help you debug your Zope: DeadlockDebugger and z3c.deadlockdebugger. Both provide a magic URL you can visit and which will return an overview of all stackframes for all threads. This will show you exactly what Zope is hiding from you.
Both these tools have an achilles heel: If all your Zope threads are stuck there is nothing available to process your magic URL. We ran into exactly that situation for a customer project and had to find a solution. Luckily this turned out to be simple: instead of requesting and returning the stackframe data through the webserver why not do it directly on the console? UNIX already provides us with a very useful signalling system available for: signals. Signals also have the benefit of being more secure: they can only be sent by someone who has access to the server and the account used to run the Zope instance. And thus was born a new product: Products.signalstack.
Once you have installed signalstack in your Zope instance all your need to do is send a USR1 signal to the Zope process and it dumps the stackframes of all threads to its standard output.
Installing
Installing signalstack is simple: you only need to install the Products.signalstack package in your site. If you are using zc.buildout just add it to the eggs-line for your instance and run buildout.
[instance]
recipe = plone.recipe.zope2instance
zope2-location = ${zope2:location}
eggs =
Plone
PIL
Products.signalstack
If you are not using buildout you can use easy_install to install it either globally or inside your Zope instance.
Using
First you need to figure out the pid (process id) of your Zope instance. You can find this in the var/instance.pid file in your buildout. If you can not find that file look for a zope process in your process listing. Once you have found the pid you can send a signal to it:$ kill -USR1 4361
Zope will respond in kind by throwing a lot of data at you:
Threads traceback dump at 2008-10-21 11:34:47
Thread -1340051456:
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZServerPublisher.py", line 19, in __init__
name, a, b=accept()
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZRendezvous.py", line 73, in accept
l.acquire()
Thread -1340583936 (GET /Plone):
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZServerPublisher.py", line 25, in __init__
response=b)
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZPublisher/Publish.py", line 401, in publish_module
environ, debug, request, response)
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZPublisher/Publish.py", line 202, in publish_module_standard
response = publish(request, module_name, after_list, debug=debug)
[ .. removed a lot of uninteresting frames here ..]
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/Products/PageTemplates/Expressions.py", line 123, in render
ob = ob()
File "/Users/wichert/Development/plone/plone3.2/src/Plone/Products/CMFPlone/browser/ploneview.py", line 287, in showEditableBorder
request = self.request
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/bdb.py", line 48, in trace_dispatch
return self.dispatch_line(frame)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/bdb.py", line 66, in dispatch_line
self.user_line(frame)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pdb.py", line 135, in user_line
self.interaction(frame, None)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/pdb.py", line 158, in interaction
self.cmdloop()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/cmd.py", line 130, in cmdloop
line = raw_input(self.prompt)
Thread -1341648896:
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZServerPublisher.py", line 19, in __init__
name, a, b=accept()
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZRendezvous.py", line 73, in accept
l.acquire()
Thread -1341116416:
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZServerPublisher.py", line 19, in __init__
name, a, b=accept()
File "/Users/wichert/Library/Zope/Zope-2.10.6-final-py2.4/lib/python/ZServer/PubCore/ZRendezvous.py", line 73, in accept
l.acquire()
End of dump
It is immediately obvious that this Zope has a problem: someone forgot to remove a pdb.set_trace() statement at line 287 of ploneview.py.
--
Wichert Akkerman