[Sidefx-houdini-list] Distributed sims failing
gary at corestudio.com
Sat Feb 11 11:59:28 EST 2017
I thought I’d post an update. Out of desperation I completely disabled the ethernet ports on a group of 4 machines and instead connected them to the farm via WiFi. And, uh, that worked. Repeatedly and reliably. So somewhere in the wired LAN (switch, cables, built-in NIC, ethernet drivers, etc) something if screwing with HQueue and distributed sims. Well, it’s a start…
Gary Jaeger / 650.728.7957 direct / 415.518.1419 mobile
> On Feb 9, 2017, at 6:42 AM, Gary Jaeger <gary at corestudio.com> wrote:
> OK, thanks. Haven’t heard from support how to do it either.
> Distributed simming fails for us about 50% of the time, in ways and on machines that seems completely random. Pairs of machines might work 5 times in succession and then fail. Same test scene, same submit, same everything. At this point I’ve done enough testing to conclude that either:
> a) something is wrong with our network (or networking on the mac) that causes failure only half the time. or
> b) something is wrong with distributed sims on the mac
> Now our farm works for everything else: v-ray, mr, maya, c4d, ae, nuke, houdini (ifd and mantra) so if it’s our network then there is something specific to distributed sims that stresses *something* to the point that it fails for this one specific task.
> Before I start going nuts with the troubleshooting - like pulling some of the machines off the farm, using different switches, etc - I have a couple questions for the masses:
> 1. does ANYONE here use a mac farm for distributed sims?
> 2. do people use HQueue or something else?
> As for 2, we use Rush for everything other than houdini, but I’d be happy to use something like deadline. But I’m not sure if Deadline simply uses hqueue ‘under the hood’ so to speak, and if so if our problem would just be the same.
> thanks for any thoughts.
> Gary Jaeger / 650.728.7957 direct / 415.518.1419 mobile
> http://corestudio.com <http://corestudio.com/>
>> On Feb 7, 2017, at 6:56 AM, Andy Nicholas <andy at ANDYNICHOLAS.COM <mailto:andy at ANDYNICHOLAS.COM>> wrote:
>> No, sorry. Not tackled that yet.
>> On 07/02/2017 14:35, Gary Jaeger wrote:
>>> Any clues on how to set this up for sliced, distributed sims? As it is, we’re only getting one of the slices in the .sim file.
>>> Gary Jaeger / 650.728.7957 direct / 415.518.1419 mobile
>>> http://corestudio.com <http://corestudio.com/> <http://corestudio.com/ <http://corestudio.com/>>
>>>> On Feb 5, 2017, at 1:01 PM, Andy Nicholas <andy at ANDYNICHOLAS.COM <mailto:andy at ANDYNICHOLAS.COM>> wrote:
>>>> Have a look at Checkpoints on the Cache tab of the ROP.
>>> Sidefx-houdini-list mailing list
>>> Sidefx-houdini-list at sidefx.com <mailto:Sidefx-houdini-list at sidefx.com>
>> Sidefx-houdini-list mailing list
>> Sidefx-houdini-list at sidefx.com <mailto:Sidefx-houdini-list at sidefx.com>
More information about the Sidefx-houdini-list