Long timeout on openTCPClientConnection

nibbol · December 22, 2008, 8:09am

I have a multiuser environment I am working on and am having a problem with the peer to peer server I have built.

When a user has logged on they create a P2P server that sets up server that listens for connections, etc. When another user enters the same area it then attempts to connect to the first user’s server. If that first user is still online, everything works as it is supposed to, but if the first user is no longer running the program the attempt to connect by the second user’s P2P takes forever to acknowledge that it was unable to make a connection, regardless of the timeout length.

The connecting code:

AuthConn = self.cManager.openTCPClientConnection(remAddress, Port,1)

Even with a connection timeout of 1 millisecond it still takes about 30-40 seconds to move past this point in the code and decide that AuthConn equals None

drwr · December 22, 2008, 7:43pm

Yes, this is a known bug. We don’t always respect the timeout on connect, sorry about that.

David

treeform · December 23, 2008, 12:23am

i have it in my code as

self.myConnection=self.manager.openTCPClientConnection(ip,self.port,0)

and it works great. If server not available it goes on to next one.

drwr · December 23, 2008, 12:52am

It works fine in 1.6.0, with the SIMPLE_THREADS switch. It doesn’t work in other contexts.

Not sure if a 0-second timeout is a great idea, though. You might overlook some otherwise perfectly good servers that just happen to be a few milliseconds farther away.

David

nibbol · December 23, 2008, 6:09am

1.6.0? Where is that, when will it be out???

In the mean time I have had to work around this by increasing the vigilance of the main server so that it doesn’t hand out references to those already gone offline. Still leaves the opportunity for an occasional coming together of this scenario.

Is it the same or similar bug that gets in the way of this?

	self.http = HTTPClient()

	self.channel=self.http.getDocument(URLSpec(URL))

I use this to get an up-to-date list of class version form the server when the client starts up and occasionally the reply seems lost in the ether, but it takes quite a while before the getDocument call times out.

treeform · December 23, 2008, 6:35am

David, actually that has not happened yet. I had people connect from all over the world to my server with that code.

drwr · December 23, 2008, 8:31pm

This is not a bug, but part of the design: the getDocument() call is blocking, and the system TCP/IP implementation decides how long the timeout is. (I’m not even sure there is an option to change the timeout length for blocking connect() calls, which is why we have the bug in the low-level case: the bug is that we provide an interface to specify the timeout, but we don’t have any way to pass that timeout request to the OS.)

If you want to avoid the timeout issues, you have to use the non-blocking interface instead. This means you do:

self.channel=self.http.makeChannel(True)
self.channel.beginGetDocument(DocumentSpec(URL))
self.channel.downloadToFile(Filename('foo.txt')) # or use downloadToRam
while self.channel.run():
  # do some other stuff.
if not self.channel.isValid():
  # handle error condition
# Read the document.

In the future 1.6.0 release (not yet scheduled), you could also solve this problem using the normal blocking interfaces, but putting it in a thread instead. Threading, of course, introduces its own set of problems.

David

drwr · December 24, 2008, 12:20am

My own comment made me realize the obvious solution to this bug: we always open the socket in non-blocking mode internally, which allows us to implement our own timeout; and then switch over to blocking mode afterwards.

I’ve just made this fix (and will check it in shortly). It will allow us to implement the timeout properly in the future, with or without the definition of SIMPLE_THREADS.

This has nothing to do with the HTTPClient connect timeout, though. That’s still based on OS-level timeouts, which we don’t have any control over.

David