A file dropped to watchfolder processed by both transcode nodes in a farm
A file dropped to watchfolder processed by both transcode nodes in a farm
Hi folks.
First of all THANKS A LOT for FFAStrans ... a great tool!
I do have following farm setup:
- 1 FFAStrans running in VM (Windows Server 2008 R2 SP1) and acting as "manager node"
- 2 FFAStrans running on physical machines (one is WS2008 R2 SP1, other one is WS2012 R2) and acting as "transcode nodes"
- I'm using version 1.0.0.5 of FFAStrans, which is running as service on all three hosts, system is set up to use local cache
- both source watchfolder and target dropfolder are located on EMC Isilon cluster (OneFS 8.0.0.1) and accessed via UNC path
I have encountered following problem - a file dropped to watchfolder is erroneously processed by both transcode nodes in the farm:
- the transcode node #1 picks up the source file
- aprox. 4 sec later the transcode node #2 picks up the same source file as well
Fortunately this issue doesn't occur very often (3 times so far), but I would like to solve it anyway.
Any ideas what is caussing the issue?
Thanks for help,
Silicon
P.S. The workflow is quite simple:
- it moves XML files from source watchfolder to target dropfolder
- it converts all non-PCM /WAV files to PCM/ WAV and drops the result to target dropfolder
- it converts all non-XDCAM compliant files to IMX50/ MXF or XDCAM HD50 / MXF (depending on res) and drops the result to target dropfolder
First of all THANKS A LOT for FFAStrans ... a great tool!
I do have following farm setup:
- 1 FFAStrans running in VM (Windows Server 2008 R2 SP1) and acting as "manager node"
- 2 FFAStrans running on physical machines (one is WS2008 R2 SP1, other one is WS2012 R2) and acting as "transcode nodes"
- I'm using version 1.0.0.5 of FFAStrans, which is running as service on all three hosts, system is set up to use local cache
- both source watchfolder and target dropfolder are located on EMC Isilon cluster (OneFS 8.0.0.1) and accessed via UNC path
I have encountered following problem - a file dropped to watchfolder is erroneously processed by both transcode nodes in the farm:
- the transcode node #1 picks up the source file
- aprox. 4 sec later the transcode node #2 picks up the same source file as well
Fortunately this issue doesn't occur very often (3 times so far), but I would like to solve it anyway.
Any ideas what is caussing the issue?
Thanks for help,
Silicon
P.S. The workflow is quite simple:
- it moves XML files from source watchfolder to target dropfolder
- it converts all non-PCM /WAV files to PCM/ WAV and drops the result to target dropfolder
- it converts all non-XDCAM compliant files to IMX50/ MXF or XDCAM HD50 / MXF (depending on res) and drops the result to target dropfolder
- Attachments
-
- Workflow.PNG (31.28 KiB) Viewed 9067 times
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hey Silicon,
welcome to the Forum and thank you for using ffastrans
This issue should be connected to a problem with concurrent file access to the installation files or more exact /db files of ffastrans.
Are the installation files also located on your isilon or is your "manager node" serving them through SMB share?
welcome to the Forum and thank you for using ffastrans
This issue should be connected to a problem with concurrent file access to the installation files or more exact /db files of ffastrans.
Are the installation files also located on your isilon or is your "manager node" serving them through SMB share?
emcodem, wrapping since 2009 you got the rhyme?
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hi emcodem
All FFAStrans binary and db files are located on my "manager node" and accessed from "transcode nodes" through SMB share.
BR,
Silicon
All FFAStrans binary and db files are located on my "manager node" and accessed from "transcode nodes" through SMB share.
BR,
Silicon
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hi Silicon,
You're using the exact same setup as I do on my in-production test farm; 1 VM hosting program files and db share and 7-8 standalone hosts ranging from win 8.1 to server 2016. This works quite well for us, and we don't have this kind of issues. But due to the "masterless" design of FFAStrans, this is actually something we cannot guarantee will never happen. However, there are some things you can do to improve it: First, you should not use 2008 servers anymore. Best would be to use newer like 2016, but even 2012 would be a better option than 2008. I have found out that this is less likely to happen on newer OS like win 10/server 2016. But also, you can try and set these parameters in the registry editor:
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v DirectoryCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileNotFoundCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileInfoCacheLifetime /t REG_DWORD /d 0
Just run these three lines on your servers (as admin), causing your servers to "ask" the share for every file operation instead of relying on a cache.
Hope this might help.
-steinar
You're using the exact same setup as I do on my in-production test farm; 1 VM hosting program files and db share and 7-8 standalone hosts ranging from win 8.1 to server 2016. This works quite well for us, and we don't have this kind of issues. But due to the "masterless" design of FFAStrans, this is actually something we cannot guarantee will never happen. However, there are some things you can do to improve it: First, you should not use 2008 servers anymore. Best would be to use newer like 2016, but even 2012 would be a better option than 2008. I have found out that this is less likely to happen on newer OS like win 10/server 2016. But also, you can try and set these parameters in the registry editor:
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v DirectoryCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileNotFoundCacheLifetime /t REG_DWORD /d 0
reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters /f /v FileInfoCacheLifetime /t REG_DWORD /d 0
Just run these three lines on your servers (as admin), causing your servers to "ask" the share for every file operation instead of relying on a cache.
Hope this might help.
-steinar
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hi Steinar
Thanks for your advice. I’ll test it asap and we will see - it will require few weeks to confirm whether it solves the issue since it is random.
Silicon
Thanks for your advice. I’ll test it asap and we will see - it will require few weeks to confirm whether it solves the issue since it is random.
Silicon
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hi Steinar
I have applied recommended registry setting to all three servers in the farm, but unfortunately it did not solved the issue
We have even had a situation, where one transcode node have been assigned two jobs for identical file (i.e. transcode node has attempted to process the file twice).
Any idea what to check / test next?
Thank you.
BR,
Silicon
I have applied recommended registry setting to all three servers in the farm, but unfortunately it did not solved the issue
We have even had a situation, where one transcode node have been assigned two jobs for identical file (i.e. transcode node has attempted to process the file twice).
Any idea what to check / test next?
Thank you.
BR,
Silicon
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Ok, could you send me the job logs of the dual-jobs? From the status monitor right-click and select "Open log folder...". Now, zip the contents and send it on PM.
-steinar
-steinar
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hi @admin
As I have already PMed you the issue occurs when filename of the imported / processed files contains letters with diacritics - for example Czech letter "č" (Windows 1250 codepage 0xE8; Unicode U+010D) or French letter "è".
So let me repeat my question: is there any recommendation from you how to setup Regional setting on machines in FFAStrans Farm? Especially "Language for non-unicode programs" setting?
Thanks
As I have already PMed you the issue occurs when filename of the imported / processed files contains letters with diacritics - for example Czech letter "č" (Windows 1250 codepage 0xE8; Unicode U+010D) or French letter "è".
So let me repeat my question: is there any recommendation from you how to setup Regional setting on machines in FFAStrans Farm? Especially "Language for non-unicode programs" setting?
Thanks
- Attachments
-
- Filename without czech diacritics - processed once.PNG (9.96 KiB) Viewed 6688 times
-
- Filename with czech diacritics - processed twice.PNG (14.2 KiB) Viewed 6688 times
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
1) Can you tell me if it still happens with this version of exe_manager, processors.a3x and status_monitor.a3x? https://we.tl/t-aIlacT0z6Z
2) Can you send me the content of ffastrans.json in the installation folder, where there's the database ("db" folder)?
Set it to something like this:
and try again to see if it happens.
rand min and max are values meant to wait a certain amount of time between when a host finds a ticket and assigns it to itself as it has found out that no other host has taken it.
About the max_queue and queue_factor, those are very limited the way I set them but only 'cause they're gonna prevent too many tickets from going to running at the very same time which also is a factor of concurrency problem.
About the other settings, like "submit_pri", it's just the priority that the files submitted via the GUI get. I like to set it as high as possible, but it doesn't really matter in this case. Same goes for max_job_list, which basically is how many jobs are gonna be displayed in the status_monitor at the same time. Again, I've put a bogus value and it doesn't really matter in this case for this kind of test. Besides, you'll never reach 900 concurrent jobs with these settings anyway eheheheheh (unless you have like 900 conditionals xD)
Alright, let me know how it went and if the problem ain't solve, I'll try to look back and test again, but please, let's leave Grandmaster rest as he's on vacation right now.
2) Can you send me the content of ffastrans.json in the installation folder, where there's the database ("db" folder)?
Set it to something like this:
Code: Select all
"max_logs_age": 14,
"youtubedl_update": true,
"rand_min": 200,
"rand_max": 400,
"core_multiplier": 1,
"max_queue": 10,
"queue_factor": 10,
"proc_execute": true,
"auto_pause": false,
"submit_pri": 4,
"max_job_list": 900,
"max_retries": 5
rand min and max are values meant to wait a certain amount of time between when a host finds a ticket and assigns it to itself as it has found out that no other host has taken it.
About the max_queue and queue_factor, those are very limited the way I set them but only 'cause they're gonna prevent too many tickets from going to running at the very same time which also is a factor of concurrency problem.
About the other settings, like "submit_pri", it's just the priority that the files submitted via the GUI get. I like to set it as high as possible, but it doesn't really matter in this case. Same goes for max_job_list, which basically is how many jobs are gonna be displayed in the status_monitor at the same time. Again, I've put a bogus value and it doesn't really matter in this case for this kind of test. Besides, you'll never reach 900 concurrent jobs with these settings anyway eheheheheh (unless you have like 900 conditionals xD)
Alright, let me know how it went and if the problem ain't solve, I'll try to look back and test again, but please, let's leave Grandmaster rest as he's on vacation right now.
Re: A file dropped to watchfolder processed by both transcode nodes in a farm
Hi FranceBB
I’ll do my best to test it over the weekend.
So you believe, that it is not caused by codepage of filename or by server’s regional settings?
I appologize for disturbing Grandmaster on vacation … I thgout is back because he was reponsing on other tickets in this forum.
I’ll do my best to test it over the weekend.
So you believe, that it is not caused by codepage of filename or by server’s regional settings?
I appologize for disturbing Grandmaster on vacation … I thgout is back because he was reponsing on other tickets in this forum.
BR,
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)
Silicon
--------
FFAStrans 1.3.0.2; WebInterface 1.3.0.0
Manager: VM: 2x Xeon E5-2630v3@2.4GHz, 8GB RAM
Workers: 3x HP DL360 G9 (2x Xeon E5-2643v3@3.4GHz,16GB RAM, nVidia M2000)+ 2x Lenovo SR665 (2x AMD EPYC730216C@3.0GHz,128GB RAM, nVidia P2200)