[uWSGI] Monitoring backlog queue

Post by Jean-Baptiste Quenot
Hi there,
Is it possible to monitor the size of the backlog queue (number of
incoming requests waiting for a worker to handle the request)?

This is already done in the master process if you use TCP sockets on Linux
(no way for UNIX socket). It alert you when the backlog is getting full,
but i can easily export this value to the uwsgi api.

And

Post by Jean-Baptiste Quenot
the max wait time?

This is harder as the webserver must pass the timestamp it started sending
the request. I have sent some month ago a patch to the nginx list to
export a $time variable but got rejected :(

--
Roberto De Ioris
http://unbit.it

t***@smail.ee

2011-02-01 14:24:34 UTC

Hello Roberto,

can you also export current used core count? "busy cores" during async. or it's
already done?

Roberto De Ioris

2011-02-03 20:18:05 UTC

Post by t***@smail.ee
Hello Roberto,
can you also export current used core count? "busy cores" during async. or it's
already done?

I do not think it is "realiable"

Take this situation:

0 running
1 running
2 unused
3 running
4 unused
5 running
6 unused

The value will report 7 core running...

It would be better to add another counter that increment after every
accept() and decrement after every close()

I will add this in 0.9.7 (i think its async stack will be finished in a
couple of days)

--
Roberto De Ioris
http://unbit.it

Jean-Baptiste Quenot

2011-02-14 14:50:41 UTC

Post by Jean-Baptiste Quenot
Is it possible to monitor the size of the backlog queue (number of
incoming requests waiting for a worker to handle the request)?

I switched back to TCP sockets and get this in uwsgi output in production:

Mon Feb 14 15:18:27 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:28 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:29 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:30 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:31 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:32 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:33 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:34 2011 - *** uWSGI listen queue of socket 4 full !!!
(128/128) ***
SIGPIPE: writing to a closed pipe/socket/fd (probably the client
disconnected) on request /status !!!
write(): Broken pipe [plugins/python/wsgi_subhandler.c line 186]

I'd like to monitor the listen queue, in order to diagnose if we are
constantly "near the limit", or if it is an "accident". Can you
export the variable please?

BTW what's the status of the syslog support? Can I use --log-syslog
reliably, and get the above messages at warn or err level? This would
be really nice!

Thanks in advance,

--
Jean-Baptiste Quenot

Roberto De Ioris

2011-02-14 19:50:15 UTC

Post by Jean-Baptiste Quenot
Is it possible to monitor the size of the backlog queue (number of
incoming requests waiting for a worker to handle the request)?

Mon Feb 14 15:18:27 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:28 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:29 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:30 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:31 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:32 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:33 2011 - *** uWSGI listen queue of socket 4 full !!!
(129/128) ***
Mon Feb 14 15:18:34 2011 - *** uWSGI listen queue of socket 4 full !!!
(128/128) ***
SIGPIPE: writing to a closed pipe/socket/fd (probably the client
disconnected) on request /status !!!
write(): Broken pipe [plugins/python/wsgi_subhandler.c line 186]
I'd like to monitor the listen queue, in order to diagnose if we are
constantly "near the limit", or if it is an "accident". Can you
export the variable please?

If i remember correctly you run with full logging disabled (so reporting
it in the logs would be useless). Maybe the master can print the 'percent'
variation ? Something like

*** listen queue is now 10 % full
*** listen queue is now 20 % full
*** listen queue is now 10 % full
*** listen queue is now 30 % full

or you want to manage it in the code accessing something like
env['uwsgi.listen_queue'] ?

Post by Jean-Baptiste Quenot
BTW what's the status of the syslog support? Can I use --log-syslog
reliably, and get the above messages at warn or err level? This would
be really nice!

Oh my god, i have totally forgot it :P

Honestly i have to recheck those part of the code, but it is very simple.

It pratically creates a socketpair between workers and master. Every time
a workers write to fd 2 the data are sent to the master that send them to
the syslog.

--
Roberto De Ioris
http://unbit.it

Jean-Baptiste Quenot

2011-02-15 09:32:52 UTC

Post by Roberto De Ioris
If i remember correctly you run with full logging disabled (so reporting
it in the logs would be useless). Maybe the master can print the 'percent'
variation ? Something like
*** listen queue is now 10 % full
*** listen queue is now 20 % full
*** listen queue is now 10 % full
*** listen queue is now 30 % full
or you want to manage it in the code accessing something like
env['uwsgi.listen_queue'] ?

Yes I'd like to manage it with something like
uwsgi.stats['listen_queue']. A 'stats' dictionary in the uwsgi module
sounds better than WSGI variables.

--
Jean-Baptiste Quenot

Roberto De Ioris

2011-02-21 06:27:14 UTC

Yes I'd like to manage it with something like
uwsgi.stats['listen_queue']. A 'stats' dictionary in the uwsgi module
sounds better than WSGI variables.
--
Jean-Baptiste Quenot

Sadly i cannot go (for now) with dictionaries as in threading mode they get
clobbered (and i do not want to add another lock in the stack ;) )

I will export a uwsgi.listen_queue() function that will return the value
taken from a shared area (no locking needed as the value is updated only
by master)

--
Roberto De Ioris
http://unbit.it

Jean-Baptiste Quenot

2011-05-31 13:23:36 UTC

Yes I'd like to manage it with something like
uwsgi.stats['listen_queue']. A 'stats' dictionary in the uwsgi module
sounds better than WSGI variables.

Sadly i cannot go (for now) with dictionaries as in threading mode they get
clobbered (and i do not want to add another lock in the stack ;) )
I will export a uwsgi.listen_queue() function that will return the value
taken from a shared area (no locking needed as the value is updated only
by master)

Hi Roberto,

I'm thinking again about monitoring the listen queue. There are
various problems:

1) listen_queue() will never give numbers above the maximum queue
size, so obviously it can only be useful to know that the queue is
full. So maybe providing listen_queue_full() boolean result instead
would make more sense.

2) as a matter of facts the error message that is printed in the logs
does not need to mention tcpi_unacked, as it will never exceed the
queue size + 1. For example, with only one worker, only one slot in
the listen queue and many concurrent requests, here is what the
messages look like:

Tue May 31 10:32:06 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:07 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:08 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:09 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:10 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:11 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:12 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:13 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:14 2011 - *** uWSGI listen queue of socket 4 full !!! (2/1) ***
Tue May 31 10:32:15 2011 - *** uWSGI listen queue of socket 4 full !!! (1/1) ***

3) I tried tweaking the error message to mention the tcpi_lost field,
but it is always zero.

So to sum up, when the queue is getting full, the sysadmin cannot
determine the severity of the downtime just by looking at the size of
the listen queue, because it remains constant at the maximum size (eg
at 128), and it is not reliable to check an instantaneous value
periodically. Knowing the number of dropped requests would be great,
but this information is not available on Linux (may be available on
FreeBSD). So I believe the only option is to add a new counter in the
master, that will be incremented everytime the listen queue has been
checked full.

I would monitor this counter and use a DERIVE data source in RRD to
graph the severity of my UWSGI app's down times.

Does it make sense?

And last but not least, querying the uwsgi python module involves
doing an HTTP request and getting a worker to serve this internal
request just for knowing the value of a counter. Or do you think of
another way to access the number? (sending a signal to the master,
like Java does for getting a stack trace, providing a unix socket
connection and fetch the data with socat, like haproxy does, or
providing a monitoring port that talks HTTP?)

Thanks in advance,

--
Jean-Baptiste Quenot

Roberto De Ioris

2011-06-01 08:13:38 UTC