[Sidefx-houdini-list] Distributed sims failing

Gary Jaeger gary at corestudio.com
Tue Jan 31 10:11:04 EST 2017


Thanks Rob-

Link below. Since I was having trouble with my scene, I just tried a vanilla splash tank which also failed. Could certainly be user error, but it’s odd that it works for a while and does generate cache files until it craps out eventually. I should also say that HQueue is working fine generating ifd files and rendering out exrs. Render farm works fine, we usually use Rush with Maya/vray/ae/c4d. 

thanks again. curious to know what you find. 

https://www.dropbox.com/sh/ade8scd1c0kbxrj/AACxnyVl7QUm_J_FOHu88jPKa?dl=0 <https://www.dropbox.com/sh/ade8scd1c0kbxrj/AACxnyVl7QUm_J_FOHu88jPKa?dl=0>

Gary Jaeger / 650.728.7957 direct / 415.518.1419 mobile
http://corestudio.com <http://corestudio.com/>
> On Jan 31, 2017, at 6:33 AM, rvinluan <rvinluan at sidefx.com> wrote:
> 
> Hi Gary,
> 
> If I were to guess it almost looks like a general network disruption.  I'm basing that on the "Write pipe error" messages. Perhaps a disruption occurs while a message is passed from one machine to another so the message is incomplete?
> 
> Anyway, would you be able to post the job output and diagnostics files for one of the failed slice jobs?  And also the .hip file?
> 
> I can give it a whirl here and see if it's an issue with HQueue or distributed sims.
> 
> Cheers,
> Rob
> 
> On 2017-01-29 7:08 PM, Gary Jaeger wrote:
>> A quick follow up. I watched on of the machines on the farm doing a sim - i
>> just did a quick splash tank to make sure it wasn't my scene. Early on the
>> CPUs are pegged and everything moves along. RAM is not even close to being
>> an issue, the process is using up about 2GB. Looks like maybe it's the
>> hython process? Anyway, at some point the CPU usage just drops to nothing.
>> The process is still alive, but doesn't seem to be doing anything.
>> 
>> On Sun, Jan 29, 2017 at 12:30 PM, Gary Jaeger <gary at corestudio.com> wrote:
>> 
>>> Anybody have any insight into this? I have a flip sim that I want to
>>> distribute. I'm pretty sure it's all set up correctly, because the sim
>>> starts and all the slices get part of the way through, but always end up
>>> failing.
>>> 
>>> I've tried both slice and slice along. When I try a slice along, the job
>>> has been getting about 14% through, then just hanging up. No error
>>> messages, etc. It just never progresses. I've also tried slice and chopping
>>> the sim into 4 quadrants. In that case I was seeing things like this:
>>> 
>>> 
>>> ALF_PROGRESS 27%
>>> ALF_PROGRESS 28%
>>> Read error on ack: Error Occurred of 12
>>> Error occurred in message 12 state is 5
>>> ---- Pump enters error status ----
>>> Tracker reports an error, aborting
>>> 
>>> 
>>> ALF_PROGRESS 27%
>>> ALF_PROGRESS 28%
>>> Tracker reports an error, aborting
>>> Write pipe error: Error Occurred offset 0 of 4
>>> Error occurred in message 4 state is 9
>>> EOF in pipe at position 0 of 4
>>> Error occurred in message 4 state is 6
>>> 
>>> --
>>> 
>>> Though the tracker task isn't reporting any errors in hqueue that I can
>>> see.
>>> 
>>> Any ideas?
>>> 
>>> 
>>> 
>>> --
>>> Gary Jaeger // Core Studio
>>> 249 Princeton Avenue
>>> Half Moon Bay, CA 94019
>>> 650.728.7957 <(650)%20728-7957> (direct) • 650.728.7060 <(650)%20728-7060>
>>> (main)
>>> http://corestudio.com <http://corestudio.com/>
>>> 
>> 
>> 
> 
> _______________________________________________
> Sidefx-houdini-list mailing list
> Sidefx-houdini-list at sidefx.com <mailto:Sidefx-houdini-list at sidefx.com>
> https://lists.sidefx.com:443/mailman/listinfo/sidefx-houdini-list <https://lists.sidefx.com/mailman/listinfo/sidefx-houdini-list>



More information about the Sidefx-houdini-list mailing list