Farm vs Workflow specifics

knk · Post by **knk** » Mon Sep 30, 2024 4:11 pm

Hi guys, this is a very generic question that I can't seem to find a proper solution.

Suppose there is a 2 machine farm, each with 24 available slots, handling all jobs x48 (hail to the RAM & CPU! lol )
Now, if I have a very specific Workflow that can only run one job at a time due to several constrains, how can I make this happen?

I've been wrapping my head around this subject and I can't seem to limit the number of executions in such a scenario withput cutting the available slots...

Any ideas?

All the best

emcodem · Post by **emcodem** » Mon Sep 30, 2024 8:26 pm

Hi again,

it means that you need more machines in your farm or expand to cloud like @FranceBB does

Sorry the prio stuff is currently hard to explain.

In case you set to 24 slots ("per prio class), you can in theory have a max of 5x25 (if not 6x25, not sure about that) jobs running but only in case you have 5 different worfklows, each set to a different priority AND the jobs are started in the lower prio first. Also read here: viewtopic.php?p=8961#p8961

Manually submitted (via FFAStrans GUI) jobs have their own prio class "5", so we have 6 prio classes: 0 to 4 for watchfolders and 5 for manual submits.
Again, it only works when the lower prio class jobs first fill up the max slots and after that, a higher prio class job starts.

In webint job submitter, i just noticed that i don't set the prio according to workflow settings currently, will fix that.

Also, one important note about priorities: usually you don't want to use anything higher than normal, better stick to using very low and low and normal (0,1,2). This is because Prios are also set as CPU prio for the running process. If you encode using high prio and use all CPUs, the Windows GUI and lots of other stuff in windows will freeze. So anything higher than normal should only be used for very short running processing.

There is also a field in ffastrans.json config file called "auto_pause", it would send lower prio jobs to pause in case a very high prio job comes in.

Post by **admin** » Mon Sep 30, 2024 8:53 pm

Hi knk,

You cannot limit a workflow like this. The closest (without adding another host for this one workflow) might be to use the feature where you tell a node (or even all nodes in your workflow) to occupy more slots in order to limit the total executed nodes per host. But it's limited occupy 16 slots, so if you have configured your host to 24 you will still have 8 free slots to use. Please also note that an incoming node with higher priority set to occupying more slots (above just 1) than what is available, will hold until the number of free slots is reached.
Just double click the node to configure the "Job processing slots" setting:

: job_slots.png (17.42 KiB) Viewed 25440 times

Hope this helps to somewhat accomplish what you need.

-steinar

emcodem · Post by **emcodem** » Mon Sep 30, 2024 9:56 pm

But isnt our 2 answers combined the ultimate answer?
E.g. your special workflow is set to higher prio than everything else in the system and you set the processor that can only execute once (or all processors) to occupy only one slot as steinar said?

Post by **FranceBB** » Tue Oct 01, 2024 9:22 am

emcodem wrote: ↑Mon Sep 30, 2024 8:26 pm it means that you need more machines in your farm or expand to cloud like @FranceBB does

I think that what I do isn't applicable to like 99.9% of users, but... sure!
I mean, you don't really have resource issues when you can spin up 640 EC2 in the blink of an eye on AWS ehehehehe
Behold to the 640 servers FFAStrans 1.4.0.7 Stable cloud farm:

: Screenshot from 2024-10-01 10-13-18.png (4.77 KiB) Viewed 25413 times

The obvious downside is that Amazon isn't a charity fund, it will charge you... a lot...
I'm told that other acceptable forms of payment for the EC2 cost are a kidney, your soul or the life of your first born child. xD
Jokes aside, the cloud is expensive... very expensive... If it was for me I would have left everything on prem, but as Ben Ben would say "c'est la vie".

Anyway, for a practical on prem approach, I'd say that Grandmaster suggested is the way to go.
Raising the resources used by a node in your special workflow will hopefully saturate all the slots and prevent the server from picking up more jobs.
They'll still queue up, though, and they're gonna be picked up once it completes.

knk · Post by **knk** » Tue Oct 01, 2024 10:13 am

Hey guys, thanks for the inputs!

This came to be when I've created a very demanding GPU process inside a specific workflow. From my testing, the machine can keep working other workflows simultaneoulsy, since everything else can be CPU and RAM demanding but does not seem to interfere much, so I wouldn't "lose" processing capability even though said workflow is running. On the other hand, If the GPU demanding workflow starts in more than 1 instance, the machine can become irresponsive.

My workaround at the moment is very file input based, but it's not "pretty", since it takes 2 workflows and scripts to ensure that only one file ends up on the input folder at a time.

If I understood @Steinar correctly, if I lower one of the machines to 16 slots and on the process step occupy the 16 slots, this will prevent the machine from starting other jobs, right?
It's not ideal, since it takes the machine "down" for a few hours while it does the demanding GPU process but for I'll test it...

Thank you guys!

PS @FranceBB now that is a propper Farm! 640 nodes I hope you have a good pickup truck to run that farm end to end and not the old grandpa's tractor

emcodem · Post by **emcodem** » Tue Oct 01, 2024 4:05 pm

@knk
ah yeah, as you said you work with nvidia, i recalled that i came up with a solution for a very similar problem here:
viewtopic.php?p=6188&hilit=nvidia+smi#p6188

Not perfect but maybe one of the better alternatives.
It is also the reason why i asked @admin for a "hook" that allows to execute some script before a watchfolder actually starts a job.

Post by **admin** » Tue Oct 01, 2024 5:28 pm

So knk, it's basically the same use case as we have where we use FFAStrans to drive Whisper transcriptions. If you leave your hosts at 24 slots and set the GPU intensive node to utilize 13 slots, then you will still have 11 free slots for other operations but not enough room for two 13 slot nodes on one host. This is how we utilize this feature and it works very well.

-steinar

emcodem · Post by **emcodem** » Tue Oct 22, 2024 7:45 pm

emcodem wrote: ↑Tue Oct 01, 2024 4:05 pm viewtopic.php?p=6188&hilit=nvidia+smi#p6188

I just needed a wf that processes one file by one while the other workflows are not blocked.
This workflow i quoted above works perfectly, i should use it more often

But don't forget to install the plugin processor "wait until file ist oldest"

emcodem · Post by **emcodem** » Sat Dec 14, 2024 4:43 pm

@mahirat can you be more specific on that question, what kind of resource(s) you mean?

FFAStrans forum

Farm vs Workflow specifics

Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics

Re: Farm vs Workflow specifics