Page 1 of 1

Status Monitor bug

Posted: Fri Nov 01, 2024 10:09 am
by artjuice
Hi,

In latest version 1.4.0.7 in Status monitor there are bug - some jobs not clear and stay here.
2024-11-01_Remote Desktop Manager-004409.png
2024-11-01_Remote Desktop Manager-004409.png (249.85 KiB) Viewed 302 times
At the same time, all materials and files are transferred in full.

Re: Status Monitor bug

Posted: Fri Nov 01, 2024 8:20 pm
by FranceBB
What the status monitor is reading are the tickets in processors/db/cache/monitor.
This is because each "job" has a json file put there to which the relevant host is writing to and keeps it updated so that you can actually see the progress with the status being updated as the job goes through all the steps.
Once the job is done, the json is supposed to be deleted, however the files might end up being "locked" if the host has it open and isn't "releasing" it. This is a particularly bad problem that I also faced in my own branches and we're fully aware of it.

But... there's light at the end of the tunnel.
My eyes lit up when I saw a commit from Grandmaster a few days ago with FileClose(FileOpen()).
I won't copy-paste it here, but the code the FileOpen() section constructs the path based on where the db resides and \monitor\ is appended to it along with the job uuid and any eventual split id in case of split workflows to form the actual json file in the folder I mentioned above at the very beginning. The file is created in mode 10 which indicates write permission on the storage and that if the directory doesn't exist 'cause it's the very first time you're running FFAStrans then it's created. Anyway, the very big change here is FileClose() wrapping FileOpen(). That will ensure that once the data is written to the json file (i.e the progress) and the json is properly saved on the storage, then the file is "released" and doesn't show up as in use by the filesystem of the storage. Then and only then the processor tries to delete the file with an async del.

The async deletion is further Grandmaster's magic in processors and it's basically a function that attempts to delete the file even if it requires multiple attempts, in fact it tries with the standard FileDelete() function first and if that doesn't work and the file is still there, then it tries to delete it again but this time with a different deletion method, namely _WinAPI_DeleteFile() which is using the Windows' API to delete it and if it still hasn't been deleted by then and the file still exists, then it creates a cmd script that tries to delete it over and over again until it's deleted. I mean, surely by then the file (in this case the ticket in the monitor folder that has been completed) should be deleted.

This is not included in 1.4.0.7 Stable but I'm sure he's gonna include it in the future versions.
I've personally used it in my branch and so far so good (although I still have to bring it to my production environment and I haven't done that just yet).


In other words, this is a long convoluted way to say that:

1) We're aware of it
2) Grandmaster fixed it

Re: Status Monitor bug

Posted: Sat Nov 02, 2024 12:34 pm
by emcodem
Potential workaround can be to just delete the jsons in monitor folder.
Your screenshot indicates that the bug happens always at the same spot, if so you could insert a cmd processor after the AirSPD-1 Processor, executing on error (or always) with this content:

Code: Select all

del C:\FFAStrans\Processors\db\cache\monitor\%s_job_id%*
If that all don't help you could at least try to delete all jsons in the monitor folder periodically. No worries if you delete something that is currently running, those files should re-appear after a few seconds.