FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Folder monitor sometimes stops seeing files in directory:
Clearing folder monitor history - does not help.
Removing files from the directory on sleep timer time - does not help.
The same problem occurred on FFAStrans v1.3.1, then I solved it by recreating the workflow from scratch. Exporting and re-importing workflow did not help as far as I remember.
Stopping/shutting down workflow does not help.Clearing folder monitor history - does not help.
Removing files from the directory on sleep timer time - does not help.
The same problem occurred on FFAStrans v1.3.1, then I solved it by recreating the workflow from scratch. Exporting and re-importing workflow did not help as far as I remember.
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Hey Mister DC,
question up in front, did you send a job log of a working job just for reference or does it contain any special informatoin for us?
ok debugging def_runner (internal name for the watchfolder script):
The "database" for monitors is here:
We remember files that have been already processed by writing a corresponding json file into the "root" of above folder.
New/incoming Files have a json in the subfolder \i until they end up in a job or being ignored forever (which moves the json one up from \i folder to root again).
In theory you reset the whole monitor by just deleting the whole "mons" folder for this workflow, this should be the same as "forget files" in the processor gui.
Also, monitoring is "started" (green on the UI), only when a ~start_%wf_guid% file is located here:
If monitoring is started, we must find an exe_manager process that runs the def_runner.a3x script and also it has -sys "workflowjson" in the commandline, like this:
Note, this process runs on all farm hosts. There will be one process for each "started" workflow.
Log files go to the def_log folder (they are often not very helpful).
What checks we do before we even come to listing files in def_runner script:
wfs\%wf_id%\mons\%node_id%/.~%NODE_ID%_%HOUR%_%ComputerName%
If your problem is again a file concurrency /cannot delete problem, it might be that this lockfile cannot be deleted, check if it exists and it is older than a few seconds.
From here the process lists the files as defined in the monitor config and if something found, writes to \i folder and if needed, moves back from \i folder and starts an according ffastrans job.
question up in front, did you send a job log of a working job just for reference or does it contain any special informatoin for us?
ok debugging def_runner (internal name for the watchfolder script):
The "database" for monitors is here:
Code: Select all
Processors\db\cache\wfs\%workflow_guid%\mons\%processor_guid%
New/incoming Files have a json in the subfolder \i until they end up in a job or being ignored forever (which moves the json one up from \i folder to root again).
In theory you reset the whole monitor by just deleting the whole "mons" folder for this workflow, this should be the same as "forget files" in the processor gui.
Also, monitoring is "started" (green on the UI), only when a ~start_%wf_guid% file is located here:
Code: Select all
\Processors\db\cache\status
Code: Select all
"C:\Temp\ffastrans1407\processors\exe_manager.exe" /ErrorStdOut /AutoIt3ExecuteScript "C:\Temp\ffastrans1407\processors\def_runner.a3x" -sys "C:\Temp\ffastrans1407\processors\db\configs\workflows\20241202-1232-1472-7fdd-319d07937bfd.json"
Log files go to the def_log folder (they are often not very helpful).
Code: Select all
Processors\db\cache\def_log
- Is the Wokflow "started"
- Sleep timer (time since last file listing, 30 by default)
- maintainance.run_on_days (cron stuff)
- is the monitor processor "bypassed"
- Is there valid JSON in the workflow json nodes.properties (of the mon_folder processor)
- Are there more files in \i folder than debug.max_queue in ffastrans.json
wfs\%wf_id%\mons\%node_id%/.~%NODE_ID%_%HOUR%_%ComputerName%
If your problem is again a file concurrency /cannot delete problem, it might be that this lockfile cannot be deleted, check if it exists and it is older than a few seconds.
From here the process lists the files as defined in the monitor config and if something found, writes to \i folder and if needed, moves back from \i folder and starts an according ffastrans job.
emcodem, wrapping since 2009 you got the rhyme?
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
The lock file has an automatic timeout...so if it persists for more than 45 minutes we ignore it and continue with monitoring. Starting and stopping should solve the problem, and clearing monitor cache should definately solve the problem. If they don't, please check if the .mxf files are actually not locked by other processes. If so FFAStrans will not pick them up unless you enable "Skip source verification".
If none of this helps then I think we need to ask you to enable more thorough logging. You can find it in the processors\db\configs\ffastrans.json file. Just open it and set "extra_log" to "true". No need to restart anything. The changes in the config file will be active a few seconds after you save the ffastrans.json file. After that please try and clear the monitor cache for the specific path and see if the files are picked up. If not, please send the latest files for the workflow id found in the Processors\db\cache\def_log folder.
-steinar
If none of this helps then I think we need to ask you to enable more thorough logging. You can find it in the processors\db\configs\ffastrans.json file. Just open it and set "extra_log" to "true". No need to restart anything. The changes in the config file will be active a few seconds after you save the ffastrans.json file. After that please try and clear the monitor cache for the specific path and see if the files are picked up. If not, please send the latest files for the workflow id found in the Processors\db\cache\def_log folder.
-steinar
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Attached just in case, thought it might be helpful.
Thanks for the as always detailed answers, guys. I'll try to dig deeper with the information you've given me
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Problem Worflow ID: 20241011-1720-1341-8ee6-84ca025e8ed0admin wrote: ↑Tue Dec 03, 2024 9:50 pm The lock file has an automatic timeout...so if it persists for more than 45 minutes we ignore it and continue with monitoring. Starting and stopping should solve the problem, and clearing monitor cache should definately solve the problem. If they don't, please check if the .mxf files are actually not locked by other processes. If so FFAStrans will not pick them up unless you enable "Skip source verification".
If none of this helps then I think we need to ask you to enable more thorough logging. You can find it in the processors\db\configs\ffastrans.json file. Just open it and set "extra_log" to "true". No need to restart anything. The changes in the config file will be active a few seconds after you save the ffastrans.json file. After that please try and clear the monitor cache for the specific path and see if the files are picked up. If not, please send the latest files for the workflow id found in the Processors\db\cache\def_log folder.
-steinar
Problem Node in it: 20241107-1516-3717-42ae-3bec17cdfae1
Here's what I've been doing:
I checked the directory '\\\192.168.24.231\FF_Install\Processors\db\cache\wfs\20241011-1720-1341-8ee6-84ca025e8ed0\mons' - it had three subfolders with the id's of each monitor folders.
I found an unreadable subfolder '\\192.168.24.231\FF_Install\Processors\db\cache\wfs\20241011-1720-1720-1341-8ee6-84ca025e8ed0\mons\20241107-1516-3717-42ae-3bec17cdfae1' with subfolder '\i'.
I restarted server 24.231 to resolve the folder blocking issue as restarting FF services didn't help. After the reboot there was no inaccessible folder there.
That didn't solve the problem.
Next I did:
Cleared the folder monitor cache for '\\nas05.tv.lan\SonyAdv' (node: 20241107-1516-3717-42ae-3bec17cdfae1) - didn't help;
I tried “Skip source verification” - no help;
Stopped/started worflow - no help;
Disabling/enabling workflow - didn't help.
At the moment there is only one *.mxf file left in the monitored directory, which is not picked up by the monitor. It is not blocked by anything - I can move it freely and according to Process Monitor there are no accesses to it (I checked on the server where the *.mxf file is stored). Processors\db\cache\wfs\20241011-1720-1341-8ee6-84ca025e8ed0\mons: Here are logs from 'Processors\db\cache\def_log': The directory '\\nas05.tv.lan\SonyAdv' doesn't seem to be monitored by anything else according to logs.
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Ok from now on i say "thanks for the quick response" only every 10th time if that's ok
So there is this elephant in the room... you found an unreadable subfolder. This is of course not normal (never seen that) and goes in direction of again concurrency issues.
The thing is that alone in our team we have lots of farms running, all on windows shares and no problems with such stuff. We really need to find out what's different for you.
In first place i can tell that before the last release, webint was reading all the mons/../i folders every few seconds. The "tickets" API call did this but in this latest release we don't do it anymore.
Knowing that you needed to restart webint to get rid of the lock of one of your monitor files, i suggest you try out if your watchfodler thingie works when either there is no webint running or only the latest release is running, better just turn it off for a few days or so.
In general we know that file concurrency issues can happen but from the experience with multiple installations they don't happen that often as they do for you. The final solution to the problem is of course that ffastrans switches from filesystem database to real database - and we are even experimenting with that from time to time BUT that will not happen anytime soon.
Until then, i hope you don't get tired helping us debugging the stuff. If needed i will think about writing some testing scripts for you to execute specialized tests regarding concurrent file/folder access from different nodes. This can potentially make problems visible faster, we remove complexity and reproduce the problem - THIS is exactly what we need to do always when we face a complex problem.
Edit: One thing we did not yet say is about Antivirus. It is usually not the problem but it has the potential to cause all kinds of strange issues with file access. Can you turn it off from doing on-access scan on the ffastrans db dir AND OR execute the processes involved like server.exe, exe_manager.exe and similar?
So there is this elephant in the room... you found an unreadable subfolder. This is of course not normal (never seen that) and goes in direction of again concurrency issues.
The thing is that alone in our team we have lots of farms running, all on windows shares and no problems with such stuff. We really need to find out what's different for you.
In first place i can tell that before the last release, webint was reading all the mons/../i folders every few seconds. The "tickets" API call did this but in this latest release we don't do it anymore.
Knowing that you needed to restart webint to get rid of the lock of one of your monitor files, i suggest you try out if your watchfodler thingie works when either there is no webint running or only the latest release is running, better just turn it off for a few days or so.
In general we know that file concurrency issues can happen but from the experience with multiple installations they don't happen that often as they do for you. The final solution to the problem is of course that ffastrans switches from filesystem database to real database - and we are even experimenting with that from time to time BUT that will not happen anytime soon.
Until then, i hope you don't get tired helping us debugging the stuff. If needed i will think about writing some testing scripts for you to execute specialized tests regarding concurrent file/folder access from different nodes. This can potentially make problems visible faster, we remove complexity and reproduce the problem - THIS is exactly what we need to do always when we face a complex problem.
Edit: One thing we did not yet say is about Antivirus. It is usually not the problem but it has the potential to cause all kinds of strange issues with file access. Can you turn it off from doing on-access scan on the ffastrans db dir AND OR execute the processes involved like server.exe, exe_manager.exe and similar?
emcodem, wrapping since 2009 you got the rhyme?
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Sorry i forgot... so the important stuff is that we get it running again, right?
Folder of interest is the missing subfolder for the SonyAdv Folder in db\cache\wfs\20241011-1720-1341-8ee6-84ca025e8ed0:
This folder should be created by the same process that runs for the other 2 watchfolders in this workflow, so the process/script itself runs. One of the first things the def_runner script does is to create the folder if it's not there, obviously this fails.
Can you create this folder manually?
You can also try to just copy/duplicate the workflow so it gets a new GUID and therefore all the folders in mons dir are recreated. When this copy is startet, the 3 subdirectories in cache/wfs/%workflowguid% must be created - if not you can stop at this point and let us know.
Folder of interest is the missing subfolder for the SonyAdv Folder in db\cache\wfs\20241011-1720-1341-8ee6-84ca025e8ed0:
Code: Select all
20241107-1516-3717-42ae-3bec17cdfae1
Can you create this folder manually?
You can also try to just copy/duplicate the workflow so it gets a new GUID and therefore all the folders in mons dir are recreated. When this copy is startet, the 3 subdirectories in cache/wfs/%workflowguid% must be created - if not you can stop at this point and let us know.
emcodem, wrapping since 2009 you got the rhyme?
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
Hi DCCentR,
Reading the def_logs I notice that the monitoring of '\\nas05.tv.lan\SonyAdv' does not even start. So it's natural that it does not pick up anything. However, WHY it does not start is a bit of mystery.
Could you please send the current workflow id 20241011-1720-1341-8ee6-84ca025e8ed0 that does not work? AND the backup \db\configs\archive\workflows\20241011-1720-1341-8ee6-84ca025e8ed0_bac.json. We just need to make sure there is nothing odd with the workflow.
I'm also attaching a modified version of def_runner.a3x for you to replace the existing in \processors. Just rename the old and replace. Please stop the workflow before replacing. This modified one will add extra logs that might shed some light on what's going on within ffastrans.
Thanks for helping debugging this issue!
-steinar
Reading the def_logs I notice that the monitoring of '\\nas05.tv.lan\SonyAdv' does not even start. So it's natural that it does not pick up anything. However, WHY it does not start is a bit of mystery.
Could you please send the current workflow id 20241011-1720-1341-8ee6-84ca025e8ed0 that does not work? AND the backup \db\configs\archive\workflows\20241011-1720-1341-8ee6-84ca025e8ed0_bac.json. We just need to make sure there is nothing odd with the workflow.
I'm also attaching a modified version of def_runner.a3x for you to replace the existing in \processors. Just rename the old and replace. Please stop the workflow before replacing. This modified one will add extra logs that might shed some light on what's going on within ffastrans.
Thanks for helping debugging this issue!
-steinar
- Attachments
-
- def_runner.a3x
- (136.19 KiB) Downloaded 75 times
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
You're welcome I wish i have more time tinkering with the FF
I see, this looks likE A LOT of work - but it should pay off in the future.emcodem wrote: ↑Wed Dec 04, 2024 11:22 pm In general we know that file concurrency issues can happen but from the experience with multiple installations they don't happen that often as they do for you. The final solution to the problem is of course that ffastrans switches from filesystem database to real database - and we are even experimenting with that from time to time BUT that will not happen anytime soon.
Yep, I added the FF installation directory and cached directories for all machines in the farm to antivirus exceptions.emcodem wrote: ↑Wed Dec 04, 2024 11:22 pm Edit: One thing we did not yet say is about Antivirus. It is usually not the problem but it has the potential to cause all kinds of strange issues with file access. Can you turn it off from doing on-access scan on the ffastrans db dir AND OR execute the processes involved like server.exe, exe_manager.exe and similar?
Also added these registry keys for all servers involved in the farm (and the servers hosting the ff installation directory and its cache) per Steinar's advice and rebooted the machines.
I tried creating it manually didn't help. Somehow that file was eventually recognized and processed that evening, probably after I recreated the problematic processor monitor.emcodem wrote: ↑Thu Dec 05, 2024 8:03 am Sorry i forgot... so the important stuff is that we get it running again, right?
Folder of interest is the missing subfolder for the SonyAdv Folder in db\cache\wfs\20241011-1720-1341-8ee6-84ca025e8ed0:This folder should be created by the same process that runs for the other 2 watchfolders in this workflow, so the process/script itself runs. One of the first things the def_runner script does is to create the folder if it's not there, obviously this fails.Code: Select all
20241107-1516-3717-42ae-3bec17cdfae1
Can you create this folder manually?
You can also try to just copy/duplicate the workflow so it gets a new GUID and therefore all the folders in mons dir are recreated. When this copy is startet, the 3 subdirectories in cache/wfs/%workflowguid% must be created - if not you can stop at this point and let us know.
**************
But today is a new day and a new stack of unprocessed files Most of the files in today's batch have been processed. But there are still two that were not picked up. So it looks like it sees and monitors the directory, but there is a problem with some files falling into it. Moreover, if I do “Rebuild cache” it creates json for them: Today I tried accessing the network share by IP instead of domain name (\\192.168.24.7\SonyAdv instead of \\nas05.tv.lan\SonyAdv). I didn't notice any difference.
Duplicated Worflow (to change ID), recreated monitor processor (to change ID), tried clearing the cache, Stopped and turned off this worflow, Tried to set a specific server to execute this wokrflow - no luck.
I turned off the workflow. Started Process Monitor on the servers (SRV043 - the only one designated server for this worflow) that are assigned to execute this workflow and started the workflow: And it looks like it's trying to do something with these mxf's but to no avail. What do you think that means?
While logging, I made a copy of one of these mxf and put it in the same directory as “TEST6.mxf" - it processed fine by the workflow.
Here is the log file - https://dropmefiles.com/MFNVP
The workflow itself and its archive:
Sorry but i recreate the wokflow under new ID, but here is the archive of the old on:
Thank you very much, I will definitely try this tomorrow. Do I also need to enable extra_log in FF for it?admin wrote: ↑Thu Dec 05, 2024 6:11 pm I'm also attaching a modified version of def_runner.a3x for you to replace the existing in \processors. Just rename the old and replace. Please stop the workflow before replacing. This modified one will add extra logs that might shed some light on what's going on within ffastrans.
Thanks for helping debugging this issue!
-steinar
UPD: you know what? After I wrote this post I tried turning on 'skip source verification' - the files were processed and moved successfully from the folder
Earlyer I tried to see if anything uses these mxf files on \\192.168.24.7\SonyAdv (\\nas05.tv.lan\SonyAdv) - nothing found. I also tried moving them from this folder - they moved freely.
Could this be a false positive on locked file state? Or some kind of self lock by FF?
Re: FFAStrans v1.4.0.7 - folder monitor sometimes stops seeing files
OK nice, at least something to work with (more or less).
Well if the file was really locked, you could not move it around i guess. Something strange is going on. This filesystem stuff is always hard to debug as its barely documented and as it's a distributed system it is even very hard to log and even harder to interpret the logs once you have some extensive log files e.g. from file server operations (guess we need microsoft for that, i have no clue how to debug SMB Server low-level ops).
No matter what, please use the def_runner.a3x that steinar sent, i doubt you need to turn on debug logs (in case he dont answer) and leave it in place. The reason for that is that the previous def_log you sent does not contain "enum:" logs for the sony folder at all, while in your last post, the file was checked continuously, which means the "enum:" logs must for the sony folder must be in the def_log at the times when we see exe_manager accessing the files in your procmon. You should even see "verify:" entries if debug.extra_log is on and a new file is being "verified".
Maybe we face different issues or there was another reason why the def_log did not contain "enum" log entries.
Coming to your procmon logs:
Just like in the last procmon you sent that was about the locked monitor jsons, we again see that msmpeng.exe is definitely in the game (should be Defender). Also, we see all operations that our def_runner script does up to the point where it finally tries to "open the file for "shared reading". This final check ends up in a "SHARING VIOLATION", which just means something thinks most likely the SMB Server thinks the file is locked.
I'd suggest 2 things, first as said, replace the def_runner and leave the debug.extra_log on.
Second, add an exclusion in Defender for exe_manager.exe. I tested that and after it i did not see any file access from msmpeng.exe anymore in combination with exe_manager.
Well if the file was really locked, you could not move it around i guess. Something strange is going on. This filesystem stuff is always hard to debug as its barely documented and as it's a distributed system it is even very hard to log and even harder to interpret the logs once you have some extensive log files e.g. from file server operations (guess we need microsoft for that, i have no clue how to debug SMB Server low-level ops).
No matter what, please use the def_runner.a3x that steinar sent, i doubt you need to turn on debug logs (in case he dont answer) and leave it in place. The reason for that is that the previous def_log you sent does not contain "enum:" logs for the sony folder at all, while in your last post, the file was checked continuously, which means the "enum:" logs must for the sony folder must be in the def_log at the times when we see exe_manager accessing the files in your procmon. You should even see "verify:" entries if debug.extra_log is on and a new file is being "verified".
Maybe we face different issues or there was another reason why the def_log did not contain "enum" log entries.
Coming to your procmon logs:
Just like in the last procmon you sent that was about the locked monitor jsons, we again see that msmpeng.exe is definitely in the game (should be Defender). Also, we see all operations that our def_runner script does up to the point where it finally tries to "open the file for "shared reading". This final check ends up in a "SHARING VIOLATION", which just means something thinks most likely the SMB Server thinks the file is locked.
I'd suggest 2 things, first as said, replace the def_runner and leave the debug.extra_log on.
Second, add an exclusion in Defender for exe_manager.exe. I tested that and after it i did not see any file access from msmpeng.exe anymore in combination with exe_manager.
emcodem, wrapping since 2009 you got the rhyme?