[ start | index | login ]
start > 2010-07-15 > 1

2010-07-15 #1

Created by arte. Last edited by arte, 3 years and 283 days ago. Viewed 2,046 times. #3
[diff] [history] [edit] [rdf]
labels
attachments

pool size, threads and TIME_WAIT

We are now in a stage where we put everything together for the new TWIMPACT. Our main analysis, done my >>Mikio now runs on >>Scala, the database is based on >>Cassandra and we have created a messaging infrastructure to distribute further analysis and aggregation tasks.

Now was the time to do some testing on real hardware to see whether it would give us what we want. In a first test using a single analysis thread did run fast at first, but came down to very low numbers storing the data. So he decided to increas the number of threads doing the job and the performance increase is immense. However, at some point our little test program started spitting out strange socket errors about not being able to "assign the requested address".

It turned out that the host system was accumulating open sockets up to the maximum number around 30k and from then on it did not allow any more new sockets to be opened. However, most of these sockets were in state TIME_WAIT, indicating a closed socket, but not finalized yet.

This did look very strange, I had wrapped >>commons-pool in a very straightforward way and the socket factory also looks simple enough. Enabling extensive debug output only revealed normal action until the errors started. Also, the actual pool size never really went above 10. And that's where that idea, lurking in the back of my mind, came rushing forward: The pools maximum amount of idle elements is set to 8 by default, leading to the problem.

  • 16 threads take out a socket from the pool (not all 16 are actually active at the same time for some reason).
  • The pool creates new sockets on demand.
  • After doing a single task the threads put back the socket.
  • The pool sometimes counts the amount of idle sockets and closes some to get back to the maximum idle count.
  • The closed sockets sit in the system with TIME_WAIT.
As all this happens very fast the pool closes and creates new sockets quickly and the system starts accumulating closed sockets until it breaks.

Setting the pool size to 16 with 16 worker threads works beautifully. Also, we have 32 (1 server, 1 client) open sockets for 16 TSocket connections using the >>Thrift API, which did speed up the accumulation of open sockets.

Conclusion: Set the pool size according to your worker threads.

no comments | post comment
[subscribe to thinkberg]

Connections:
>>TWIMPACT
>>Mikio
>>WSDHA
>>Serienjunkies
>>Stephans Blog
>>USA Erklärt
>>sofa. rites de passage
>>langreiter.com
>>henso.com

Logged in Users: (0)
… and a Guest.
14 users and 293 docs.
Emerged 10 years and 126 days ago

Current Gaming:
New Super Mario Bros. Dr. Kawashima's Brain Training

Ohloh profile for Matthias L. Jugel

< April 2014 >
SunMonTueWedThuFriSat
12345
6789101112
13141516171819
20212223242526
27282930

Portlet 1
thinkberg
subconscious opinions
Copyright © 2005-2008 Matthias L. Jugel | SnipSnap 1.0b3-uttoxeter