pool size, threads and TIME_WAIT
We are now in a stage where we put everything together for the new TWIMPACT. Our main analysis, done my Mikio
now runs on Scala
, the database is based on Cassandra
and we have created a messaging infrastructure to distribute further analysis and aggregation tasks.
Now was the time to do some testing on real hardware to see whether it would give us what we want. In a first test using a single analysis thread did run fast at first, but came down to very low numbers storing the data. So he decided to increas the number of threads doing the job and the performance increase is immense. However, at some point our little test program started spitting out strange socket errors about not being able to "assign the requested address
It turned out that the host system was accumulating open sockets up to the maximum number around 30k and from then on it did not allow any more new sockets to be opened. However, most of these sockets were in state TIME_WAIT, indicating a closed socket, but not finalized yet.
This did look very strange, I had wrapped commons-pool
in a very straightforward way and the socket factory also looks simple enough. Enabling extensive debug output only revealed normal action until the errors started. Also, the actual pool size never really went above 10. And that's where that idea, lurking in the back of my mind, came rushing forward: The pools maximum amount of idle elements is set to 8 by default, leading to the problem.
- 16 threads take out a socket from the pool (not all 16 are actually active at the same time for some reason).
- The pool creates new sockets on demand.
- After doing a single task the threads put back the socket.
- The pool sometimes counts the amount of idle sockets and closes some to get back to the maximum idle count.
- The closed sockets sit in the system with TIME_WAIT.
As all this happens very fast the pool closes and creates new sockets quickly and the system starts accumulating closed sockets until it breaks.
Setting the pool size to 16 with 16 worker threads works beautifully. Also, we have 32 (1 server, 1 client) open sockets for 16 TSocket connections using the Thrift API
, which did speed up the accumulation of open sockets.
Conclusion: Set the pool size according to your worker threads.