A file dropped to watchfolder processed by both transcode nodes in a farm

Silicon · Post by **Silicon** » Fri Sep 04, 2020 12:53 pm

Hi folks.
First of all THANKS A LOT for FFAStrans ... a great tool!

I do have following farm setup:
- 1 FFAStrans running in VM (Windows Server 2008 R2 SP1) and acting as "manager node"
- 2 FFAStrans running on physical machines (one is WS2008 R2 SP1, other one is WS2012 R2) and acting as "transcode nodes"
- I'm using version 1.0.0.5 of FFAStrans, which is running as service on all three hosts, system is set up to use local cache
- both source watchfolder and target dropfolder are located on EMC Isilon cluster (OneFS 8.0.0.1) and accessed via UNC path

I have encountered following problem - a file dropped to watchfolder is erroneously processed by both transcode nodes in the farm:
- the transcode node #1 picks up the source file
- aprox. 4 sec later the transcode node #2 picks up the same source file as well

Fortunately this issue doesn't occur very often (3 times so far), but I would like to solve it anyway.
Any ideas what is caussing the issue?

Thanks for help,
Silicon

P.S. The workflow is quite simple:
- it moves XML files from source watchfolder to target dropfolder
- it converts all non-PCM /WAV files to PCM/ WAV and drops the result to target dropfolder
- it converts all non-XDCAM compliant files to IMX50/ MXF or XDCAM HD50 / MXF (depending on res) and drops the result to target dropfolder

emcodem · Post by **emcodem** » Fri Sep 04, 2020 2:17 pm

Hey Silicon,
welcome to the Forum and thank you for using ffastrans

This issue should be connected to a problem with concurrent file access to the installation files or more exact /db files of ffastrans.
Are the installation files also located on your isilon or is your "manager node" serving them through SMB share?

Silicon · Post by **Silicon** » Fri Sep 04, 2020 2:33 pm

Hi emcodem
All FFAStrans binary and db files are located on my "manager node" and accessed from "transcode nodes" through SMB share.
BR,
Silicon

Post by **admin** » Fri Sep 04, 2020 9:57 pm

Hi Silicon,

You're using the exact same setup as I do on my in-production test farm; 1 VM hosting program files and db share and 7-8 standalone hosts ranging from win 8.1 to server 2016. This works quite well for us, and we don't have this kind of issues. But due to the "masterless" design of FFAStrans, this is actually something we cannot guarantee will never happen. However, there are some things you can do to improve it: First, you should not use 2008 servers anymore. Best would be to use newer like 2016, but even 2012 would be a better option than 2008. I have found out that this is less likely to happen on newer OS like win 10/server 2016. But also, you can try and set these parameters in the registry editor:

reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v DirectoryCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileNotFoundCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileInfoCacheLifetime /t REG_DWORD /d 0

Just run these three lines on your servers (as admin), causing your servers to "ask" the share for every file operation instead of relying on a cache.
Hope this might help.

-steinar

Silicon · Post by **Silicon** » Sat Sep 05, 2020 8:39 am

Hi Steinar
Thanks for your advice. I’ll test it asap and we will see - it will require few weeks to confirm whether it solves the issue since it is random.
Silicon

Silicon · Post by **Silicon** » Tue Sep 15, 2020 7:44 am

Hi Steinar
I have applied recommended registry setting to all three servers in the farm, but unfortunately it did not solved the issue

We have even had a situation, where one transcode node have been assigned two jobs for identical file (i.e. transcode node has attempted to process the file twice).
Any idea what to check / test next?
Thank you.
BR,
Silicon

Post by **admin** » Tue Sep 15, 2020 10:11 am

Ok, could you send me the job logs of the dual-jobs? From the status monitor right-click and select "Open log folder...". Now, zip the contents and send it on PM.

-steinar

Silicon · Post by **Silicon** » Fri Jul 30, 2021 9:15 am

Hi @admin
As I have already PMed you the issue occurs when filename of the imported / processed files contains letters with diacritics - for example Czech letter "č" (Windows 1250 codepage 0xE8; Unicode U+010D) or French letter "è".

So let me repeat my question: is there any recommendation from you how to setup Regional setting on machines in FFAStrans Farm? Especially "Language for non-unicode programs" setting?
Thanks

Post by **FranceBB** » Fri Jul 30, 2021 5:45 pm

1) Can you tell me if it still happens with this version of exe_manager, processors.a3x and status_monitor.a3x? https://we.tl/t-aIlacT0z6Z

2) Can you send me the content of ffastrans.json in the installation folder, where there's the database ("db" folder)?

Set it to something like this:

Code: Select all

"max_logs_age": 14,
    "youtubedl_update": true,
    "rand_min": 200,
    "rand_max": 400,
    "core_multiplier": 1,
    "max_queue": 10,
    "queue_factor": 10,
    "proc_execute": true,
    "auto_pause": false,
    "submit_pri": 4,
    "max_job_list": 900,
    "max_retries": 5

and try again to see if it happens.
rand min and max are values meant to wait a certain amount of time between when a host finds a ticket and assigns it to itself as it has found out that no other host has taken it.
About the max_queue and queue_factor, those are very limited the way I set them but only 'cause they're gonna prevent too many tickets from going to running at the very same time which also is a factor of concurrency problem.

About the other settings, like "submit_pri", it's just the priority that the files submitted via the GUI get. I like to set it as high as possible, but it doesn't really matter in this case. Same goes for max_job_list, which basically is how many jobs are gonna be displayed in the status_monitor at the same time. Again, I've put a bogus value and it doesn't really matter in this case for this kind of test. Besides, you'll never reach 900 concurrent jobs with these settings anyway eheheheheh (unless you have like 900 conditionals xD)

Alright, let me know how it went and if the problem ain't solve, I'll try to look back and test again, but please, let's leave Grandmaster rest as he's on vacation right now.

Silicon · Post by **Silicon** » Fri Jul 30, 2021 5:53 pm

Hi FranceBB
I’ll do my best to test it over the weekend.
So you believe, that it is not caused by codepage of filename or by server’s regional settings?

I appologize for disturbing Grandmaster on vacation

… I thgout is back because he was reponsing on other tickets in this forum.

FFAStrans forum

A file dropped to watchfolder processed by both transcode nodes in a farm

A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm