A file dropped to watchfolder processed by both transcode nodes in a farm

Here you can submit bugreports
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

A file dropped to watchfolder processed by both transcode nodes in a farm

Post by Silicon »

Hi folks.
First of all THANKS A LOT for FFAStrans ... a great tool!

I do have following farm setup:
- 1 FFAStrans running in VM (Windows Server 2008 R2 SP1) and acting as "manager node"
- 2 FFAStrans running on physical machines (one is WS2008 R2 SP1, other one is WS2012 R2) and acting as "transcode nodes"
- I'm using version 1.0.0.5 of FFAStrans, which is running as service on all three hosts, system is set up to use local cache
- both source watchfolder and target dropfolder are located on EMC Isilon cluster (OneFS 8.0.0.1) and accessed via UNC path

I have encountered following problem - a file dropped to watchfolder is erroneously processed by both transcode nodes in the farm:
- the transcode node #1 picks up the source file
- aprox. 4 sec later the transcode node #2 picks up the same source file as well :o

Fortunately this issue doesn't occur very often (3 times so far), but I would like to solve it anyway.
Any ideas what is caussing the issue?

Thanks for help,
Silicon

P.S. The workflow is quite simple:
- it moves XML files from source watchfolder to target dropfolder
- it converts all non-PCM /WAV files to PCM/ WAV and drops the result to target dropfolder
- it converts all non-XDCAM compliant files to IMX50/ MXF or XDCAM HD50 / MXF (depending on res) and drops the result to target dropfolder
Attachments
Workflow.PNG
Workflow.PNG (31.28 KiB) Viewed 9060 times
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
emcodem
Posts: 1811
Joined: Wed Sep 19, 2018 8:11 am

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by emcodem »

Hey Silicon,
welcome to the Forum and thank you for using ffastrans :-)

This issue should be connected to a problem with concurrent file access to the installation files or more exact /db files of ffastrans.
Are the installation files also located on your isilon or is your "manager node" serving them through SMB share?
emcodem, wrapping since 2009 you got the rhyme?
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by Silicon »

Hi emcodem
All FFAStrans binary and db files are located on my "manager node" and accessed from "transcode nodes" through SMB share.
BR,
Silicon
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
admin
Site Admin
Posts: 1687
Joined: Sat Feb 08, 2014 10:39 pm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by admin »

Hi Silicon,

You're using the exact same setup as I do on my in-production test farm; 1 VM hosting program files and db share and 7-8 standalone hosts ranging from win 8.1 to server 2016. This works quite well for us, and we don't have this kind of issues. But due to the "masterless" design of FFAStrans, this is actually something we cannot guarantee will never happen. However, there are some things you can do to improve it: First, you should not use 2008 servers anymore. Best would be to use newer like 2016, but even 2012 would be a better option than 2008. I have found out that this is less likely to happen on newer OS like win 10/server 2016. But also, you can try and set these parameters in the registry editor:

reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v DirectoryCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileNotFoundCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileInfoCacheLifetime /t REG_DWORD /d 0


Just run these three lines on your servers (as admin), causing your servers to "ask" the share for every file operation instead of relying on a cache.
Hope this might help.

-steinar
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by Silicon »

Hi Steinar
Thanks for your advice. I’ll test it asap and we will see - it will require few weeks to confirm whether it solves the issue since it is random.
Silicon
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by Silicon »

Hi Steinar
I have applied recommended registry setting to all three servers in the farm, but unfortunately it did not solved the issue :-(
We have even had a situation, where one transcode node have been assigned two jobs for identical file (i.e. transcode node has attempted to process the file twice).
Any idea what to check / test next?
Thank you.
BR,
Silicon
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
admin
Site Admin
Posts: 1687
Joined: Sat Feb 08, 2014 10:39 pm

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by admin »

Ok, could you send me the job logs of the dual-jobs? From the status monitor right-click and select "Open log folder...". Now, zip the contents and send it on PM.

-steinar
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by Silicon »

Hi @admin
As I have already PMed you the issue occurs when filename of the imported / processed files contains letters with diacritics - for example Czech letter "č" (Windows 1250 codepage 0xE8; Unicode U+010D) or French letter "è".

So let me repeat my question: is there any recommendation from you how to setup Regional setting on machines in FFAStrans Farm? Especially "Language for non-unicode programs" setting?
Thanks
Attachments
Filename without czech diacritics - processed once.PNG
Filename without czech diacritics - processed once.PNG (9.96 KiB) Viewed 6681 times
Filename with czech diacritics - processed twice.PNG
Filename with czech diacritics - processed twice.PNG (14.2 KiB) Viewed 6681 times
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
User avatar
FranceBB
Posts: 264
Joined: Sat Jun 25, 2016 3:43 pm
Contact:

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by FranceBB »

1) Can you tell me if it still happens with this version of exe_manager, processors.a3x and status_monitor.a3x? https://we.tl/t-aIlacT0z6Z

2) Can you send me the content of ffastrans.json in the installation folder, where there's the database ("db" folder)?

Set it to something like this:

Code: Select all

"max_logs_age": 14,
    "youtubedl_update": true,
    "rand_min": 200,
    "rand_max": 400,
    "core_multiplier": 1,
    "max_queue": 10,
    "queue_factor": 10,
    "proc_execute": true,
    "auto_pause": false,
    "submit_pri": 4,
    "max_job_list": 900,
    "max_retries": 5
and try again to see if it happens.
rand min and max are values meant to wait a certain amount of time between when a host finds a ticket and assigns it to itself as it has found out that no other host has taken it.
About the max_queue and queue_factor, those are very limited the way I set them but only 'cause they're gonna prevent too many tickets from going to running at the very same time which also is a factor of concurrency problem.

About the other settings, like "submit_pri", it's just the priority that the files submitted via the GUI get. I like to set it as high as possible, but it doesn't really matter in this case. Same goes for max_job_list, which basically is how many jobs are gonna be displayed in the status_monitor at the same time. Again, I've put a bogus value and it doesn't really matter in this case for this kind of test. Besides, you'll never reach 900 concurrent jobs with these settings anyway eheheheheh (unless you have like 900 conditionals xD)


Alright, let me know how it went and if the problem ain't solve, I'll try to look back and test again, but please, let's leave Grandmaster rest as he's on vacation right now. :)
User avatar
Silicon
Posts: 98
Joined: Fri Sep 04, 2020 6:34 am

Re: A file dropped to watchfolder processed by both transcode nodes in a farm

Post by Silicon »

Hi FranceBB
I’ll do my best to test it over the weekend.
So you believe, that it is not caused by codepage of filename or by server’s regional settings?

I appologize for disturbing Grandmaster on vacation :roll: … I thgout is back because he was reponsing on other tickets in this forum.
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Post Reply