FFAStrans workflows not balancing between hosts in transcoding farm

Here you can submit bugreports
trotskylenin
Posts: 7
Joined: Thu Jun 21, 2018 1:46 pm

FFAStrans workflows not balancing between hosts in transcoding farm

Post by trotskylenin »

Hello guys.
First of all I wanna congratulate you for the great piece of software you have created :D .
FFAStrans gives the power and flexibility of enterprise software like Telestream Vantage to average users and hobbists and that makes it unique in its own category.
I would be glad to help you in any form I can - translating to spanish or whatever you need.
That being said, the reason I'm creating this bug report is because I've found that FFAStrans seems to not balance the workload too well between different hosts in a transcoding farm.
Currently I've two PCs running FFAStrans from a common smb share with a common smb working directory.
One of this PCs has a Ryzen 5 processor with a NVIDIA GTX 1650 with 16 GB RAM DDR4 and the other one is a Intel Core i5 gen 1 with a NVIDIA GTX 1050 with 8 GB ram DDR3.
Even if the first PC is a lot faster than the second one, most of the jobs seem to be starting in the slower one. I first thought that the second one was not processing anything at all because some files got even queued. But later I tried a different workflow that only adds subtitles to a MKV file and the files that I put in there went through using the second PC - the lightest workflow was done by the most powerful machine while the weaker one was struggling to convert 4 files to hevc :lol:

So my questions are:
How does the load balancing exactly works when you have multiple hosts running FFAStrans?
Is there a way to change the way workflows or jobs are assigned to specific hosts?

Thanks you for your help in advance.
emcodem
Posts: 1646
Joined: Wed Sep 19, 2018 8:11 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by emcodem »

Hey there,

really good question and pretty simple answer:
Unless you configure your workflow to run only on one specific host, all hosts in the farm will compete against each other grabbing the new job ticket files.
Both host scan the tickets folder every quater of a second for new job tickets. The one host that is accidently first gets the job in case it decides that it has free slots.
emcodem, wrapping since 2009 you got the rhyme?
admin
Site Admin
Posts: 1659
Joined: Sat Feb 08, 2014 10:39 pm

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by admin »

Hi trotskylenin,

Due to the masterless design of FFAStrans, hosts don't know much about each other as there is no master to tell them what to do. However, there's always room for improvement. One thing we could look at is to add some additional delay once a job has been taken so that other hosts might get a second job. It should be trivial to implement and I'll will take look at how that behaves.

Thanks for the kind words and the reporting ! Also, thank you for your offer to help :-) You can PM me if you have something particular in mind besides the translation. FFAStrans will stay native English.

-steinar
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by Silicon »

Hi there
Unfortunatelly I have confirm, that sometimes jobs are assigned to the transcoding nodes in the way, that some nodes are overloaded while other are doing just one job or none. See screenshot attached:
- I have got 4 transcoding nodes in the farme with hostnames "GRFCODER3" (Win10), "PR-CARB-SRV-2" (still WS2008), "PR-CARB-SRV-3" (WS2012) and "TEST-TRANSCODER" (WS2012)
- from the screenshot you can see, that "TEST-TRANSCODER" is overloaded by 7 jobs and 3 of them are in "Waiting for resources state", while others have got just one job or even none ("PR-CARB-SRV-2")
Is there any chance how to improve jobs distrubution (beside using the "List of exlude / include hosts" for static assignment of individual workflows)?
Thank you
Attachments
FFAStrans status monitor - one transcode node overloaded while others not utilized.PNG
FFAStrans status monitor - one transcode node overloaded while others not utilized.PNG (59.21 KiB) Viewed 4840 times
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
emcodem
Posts: 1646
Joined: Wed Sep 19, 2018 8:11 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by emcodem »

OK so while @admin is on his vacation, not sure if he reads/posts in the next 2 weeks, i'd like to elaborate the options here. Up in front, there is no really clean option to come around this as it is now. The only part where we can influence the job distribution is by modifying workflows "include/exclude hosts" section and this section is not really designed to be modified dynamically.

So this forum thread alone has 2 pretty different requirements regarding job distribution in farm environment:

1) @trotskylenin would basically like to prefer one server over the second one, e.g. one has a faster CPU and jobs should only hit the other node when the faster machines slots are full

2) @Silicon would like to just see an equal distribution between all farm nodes

Looking how this works in other similar software we usually only have the option for 2) (equal distribution) - and it is even kind of standard/default from my previous experience.
Adding functionality to support 1) would most likely end up in a mess for both, developer and user because the number of different requirements is high and they vary a lot (e.g. consider the running Job's ETA plus the estimated ETA of the new job plus consider the presence, max. performance and current utilisation of Graphics boards)

So after all, i really believe that equal job distibution (2) is something that ffastrans should do out of the box, but in order to keep it maintainable,predictable, testable and useable, ffastrans core should not support any more complex job distribution decisions.
Taking this further, i can really imagine to support complex usecases like 1), but not with inbuilt and graphical User interface support - instead one should be able to provide his own script logic for job distribution, so we'd just need some kind of defined hook at this point where the decision in exe_manager happens if a node takes a job or not. This way virtually any requirement can potentially be fulfilled.
emcodem, wrapping since 2009 you got the rhyme?
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by Silicon »

Hi emcodem
Thanks for reaction. I need to clarify one statement you made: I really believe that equal job distribution (2) is something that ffastrans should do out of the box. Does it mean, that FFAstrans already has this feature (if so it doesn't work :( ) or it should have according to your opinion?

And one more practical question: is there any chance how to manually reassign a task to other transcode node in the farm?
Thanks
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
emcodem
Posts: 1646
Joined: Wed Sep 19, 2018 8:11 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by emcodem »

Hey,
no the status quo is as described above, it is just my opinion that it should do equal distribution, in other words i vote for this feature enhancement. (But mostly driven by the thought that "others do it too").

The only way currently to influence host assignement is to alter the include/exclude hosts BEFORE the job is started, so yes you can manually re-assign by cancelling, altering the workflow to include only the host of interest and restart the job (which of course nobody will ever really do). I played with the thought to automate something around this area but it is not yet ment to be controlled by a 3rdparty in an automated fashion, so i leave it for now.
emcodem, wrapping since 2009 you got the rhyme?
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by Silicon »

OK, thanks for clarification. One more question: have you changed something between versions 1.2.0 and 1.2.1, what might cause this behaviour I have described? Because I haven't noticed it on version 1.2.0 (but on the other hand I have introduced 3 new compute-time consuming worflows recently, so it might be just coincidence it did not appear before).
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
emcodem
Posts: 1646
Joined: Wed Sep 19, 2018 8:11 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by emcodem »

Hmmm i just compared some corresponding functions but it doesnt look like any of them was changed between 1.2 and 1.2.1.
Some things you only recognize when taking a deep look to them :D

At the end, if you do mostly the same heavy stuff e.g. transcode to xdcam, and you configure your slots in a way that a machine can only have like 90% CPU utilisation when all slots are full, you wouldnt really suffer from the way it currently works. It is only problematic when you have multiple different workflows where one can utilize 100% of a machine and the other one only utilizes 10% over a long time period.
emcodem, wrapping since 2009 you got the rhyme?
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: FFAStrans workflows not balancing between hosts in transcoding farm

Post by Silicon »

Hi emcodem
Yes, I do have 32 "import workflows" configured converting from differrent profi and consumer codecs to XDCAM HD422 or IMX50 (for SD).
A week ago I have started to add "export workflows" (only 3 of them so far) converting from XDCAM HD422 (mostly) to H.264 High@L4.2 with station logo or timecode overlayed.
And at this moment the "not balancing" problems have started I think.
At the moment I have got 4 transcoding nodes to process both imports and exports. To avoid situation when all imports are blocked or delayed I'll try to assign export workflows to only 2 transcoding nodes out of these 4. Hope it will improve the situation a little bit :roll:
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Post Reply