We recently ran into a problem in our production environment in which the survey emperor couldnât serve more requests because our system ran out of file descriptors. When we investigated, we found a very large number of fds per vassal (about 200 per vassal), and an even larger number for the emperor itself (starting fairly small after a restart, then creeping upwards to the thousands). Hereâs a sample for a vassal:
# ls -l /proc/46248/fd
total 0
lr-x------ 1 s1 s1 64 May 7 08:54 0 -> /dev/null
lrwx------ 1 s1 s1 64 May 7 08:54 1 -> /home/log_intellisurvey/7.0/uwsgi/survey/rni05407104p.log
lrwx------ 1 s1 s1 64 May 7 08:54 10 -> socket:[1281427374]
lrwx------ 1 s1 s1 64 May 7 08:54 100 -> socket:[1281440771]
lrwx------ 1 s1 s1 64 May 7 08:54 101 -> socket:[1281440772]
lrwx------ 1 s1 s1 64 May 7 08:54 102 -> socket:[1281440773]
lrwx------ 1 s1 s1 64 May 7 08:54 103 -> socket:[1281440774]
lrwx------ 1 s1 s1 64 May 7 08:54 104 -> socket:[1281440775]
lrwx------ 1 s1 s1 64 May 7 08:54 105 -> socket:[1281440776]
lrwx------ 1 s1 s1 64 May 7 08:54 106 -> socket:[1281440777]
lrwx------ 1 s1 s1 64 May 7 08:54 107 -> socket:[1281440778]
lrwx------ 1 s1 s1 64 May 7 08:54 108 -> socket:[1281440779]
lrwx------ 1 s1 s1 64 May 7 08:54 109 -> socket:[1281440780]
lrwx------ 1 s1 s1 64 May 7 08:54 11 -> socket:[1281432897]
lrwx------ 1 s1 s1 64 May 7 08:54 110 -> socket:[1281440781]
lrwx------ 1 s1 s1 64 May 7 08:54 111 -> socket:[1281440782]
lrwx------ 1 s1 s1 64 May 7 08:54 112 -> socket:[1281440783]
lrwx------ 1 s1 s1 64 May 7 08:54 113 -> socket:[1281440784]
lrwx------ 1 s1 s1 64 May 7 08:54 114 -> socket:[1281440785]
lrwx------ 1 s1 s1 64 May 7 08:54 115 -> socket:[1281440786]
lrwx------ 1 s1 s1 64 May 7 08:54 116 -> socket:[1281440787]
lrwx------ 1 s1 s1 64 May 7 08:54 117 -> socket:[1281440788]
lrwx------ 1 s1 s1 64 May 7 08:54 118 -> socket:[1281440789]
lrwx------ 1 s1 s1 64 May 7 08:54 119 -> socket:[1281440790]
lrwx------ 1 s1 s1 64 May 7 08:54 12 -> /home/log_intellisurvey/7.0/uwsgi/survey_emperor.log
lrwx------ 1 s1 s1 64 May 7 08:54 120 -> socket:[1281440791]
lrwx------ 1 s1 s1 64 May 7 08:54 121 -> socket:[1281440792]
lrwx------ 1 s1 s1 64 May 7 08:54 122 -> socket:[1281440793]
lrwx------ 1 s1 s1 64 May 7 08:54 123 -> socket:[1281440794]
⊠and so on, for a total of 220, mostly sockets.
For the emperor, again it is mostly socket connections. Reading the docs and searching for file descriptors, I see a recommendation for âclose-on-execâ. We will try that, but I also thought Iâd post in case anyone else has seen this kind of behavior and has any recommendations.
We are running uwsgi 2.1 with perl and using a few advanced features such as the fork-server option, so perhaps that could have something to do with it? Thanks in advance for any comments or advice.
Rob Messer