[Sidefx-houdini-list] Distributed sims failing

rvinluan rvinluan at sidefx.com
Tue Jan 31 09:33:53 EST 2017


Hi Gary,

If I were to guess it almost looks like a general network disruption.  
I'm basing that on the "Write pipe error" messages. Perhaps a disruption 
occurs while a message is passed from one machine to another so the 
message is incomplete?

Anyway, would you be able to post the job output and diagnostics files 
for one of the failed slice jobs?  And also the .hip file?

I can give it a whirl here and see if it's an issue with HQueue or 
distributed sims.

Cheers,
Rob

On 2017-01-29 7:08 PM, Gary Jaeger wrote:
> A quick follow up. I watched on of the machines on the farm doing a sim - i
> just did a quick splash tank to make sure it wasn't my scene. Early on the
> CPUs are pegged and everything moves along. RAM is not even close to being
> an issue, the process is using up about 2GB. Looks like maybe it's the
> hython process? Anyway, at some point the CPU usage just drops to nothing.
> The process is still alive, but doesn't seem to be doing anything.
>
> On Sun, Jan 29, 2017 at 12:30 PM, Gary Jaeger <gary at corestudio.com> wrote:
>
>> Anybody have any insight into this? I have a flip sim that I want to
>> distribute. I'm pretty sure it's all set up correctly, because the sim
>> starts and all the slices get part of the way through, but always end up
>> failing.
>>
>> I've tried both slice and slice along. When I try a slice along, the job
>> has been getting about 14% through, then just hanging up. No error
>> messages, etc. It just never progresses. I've also tried slice and chopping
>> the sim into 4 quadrants. In that case I was seeing things like this:
>>
>>
>> ALF_PROGRESS 27%
>> ALF_PROGRESS 28%
>> Read error on ack: Error Occurred of 12
>> Error occurred in message 12 state is 5
>> ---- Pump enters error status ----
>> Tracker reports an error, aborting
>>
>>
>> ALF_PROGRESS 27%
>> ALF_PROGRESS 28%
>> Tracker reports an error, aborting
>> Write pipe error: Error Occurred offset 0 of 4
>> Error occurred in message 4 state is 9
>> EOF in pipe at position 0 of 4
>> Error occurred in message 4 state is 6
>>
>> --
>>
>> Though the tracker task isn't reporting any errors in hqueue that I can
>> see.
>>
>> Any ideas?
>>
>>
>>
>> --
>> Gary Jaeger // Core Studio
>> 249 Princeton Avenue
>> Half Moon Bay, CA 94019
>> 650.728.7957 <(650)%20728-7957> (direct) • 650.728.7060 <(650)%20728-7060>
>> (main)
>> http://corestudio.com
>>
>
>




More information about the Sidefx-houdini-list mailing list