emcodem wrote: ↑Wed Nov 13, 2024 11:37 pm
However, this case should be much simpler because the reboot solved it. I guess one of our processes is just still holding the file handle and won't let go of it. But we don't know currently if it is ffastrans itself or webint.
Maybe next time you can apply the procedure to find out which process is holding a file handle to the affected file described here:
https://serverfault.com/questions/1966/ ... in-windows
The next one didn't take long
This time it was the jobs of the same workflow as before, but they was stuck in web and in status monitor too:
- Снимок экрана 2024-11-14 110110.png (60.43 KiB) Viewed 3393 times
I found these files are locked only by host machine (SRV-CINEGY1) serving the FFAStrans dir and also doing local processing:
- FFAStrans dir server3.png (37.77 KiB) Viewed 3393 times
Opened by Administrator for reading:
- FFAStrans dir server.png (448.44 KiB) Viewed 3393 times
Also I managed to start Process Monitor on SRV-CINEGY1 something around 13 minutes before jobs get stuck. it seems that when cmd.exe appeared, the problem started:
- FFAStrans dir server2.png (1.72 MiB) Viewed 3393 times
Here is the events list from Process Monitor filtered by one of the stuck json (20241114-0613-4367-35a0-e076c917518b~1-0-0.json):
https://dropmefiles.com/LHJuA
And full events list without filter, containts all three jsons (20241114-0615-1434-9971-43cca26ef65f~1-0-0.json, 20241114-0613-4367-35a0-e076c917518b~1-0-0.json, 20241114-0616-4487-7178-ca416d100302~1-0-0.json):
https://dropmefiles.com/wuW7z
emcodem wrote: ↑Wed Nov 13, 2024 11:37 pm
Also, if i remember correctly on Netapp and Isilon we had to opt-in a related feature called "oplocks" (opportunistic file locking), maybe you can check back with the storage documentation if there is some related setting and report back about it?
For the farm we use self-built NAS'es, which are Windows Server servers with hardware raid controllers. In this case, the host machine for FFAStrans dir is a regular Windows 10 Pro.
It looks like Windows has settings regarding Oplocks in the registry
For the farm we use self-built NAS'es and servers, which are Windows Server 2019+ servers with hardware raid controllers. In this case, the host machine for FFAStrans dir is a regular Windows 10 Pro.
It looks like Windows has settings for disabling Oplocks in the registry for SMB1
https://support.storeporter.com/hc/en-u ... 20networks. But it seems that this is no longer relevant as SMB1 is disabled by default in current versions of WIndows.
emcodem wrote: ↑Wed Nov 13, 2024 11:37 pm
Don't get me wrong, i am not sure if the problem is actually anyhow related to webinterface's bare existence, it is absolutely possible that the same error would occur when ffastrans runs alone. We just don't know currently.
This time I didn't restart the server to solve the problem. I started stopping the “FFAStrans REST-Service” and “FFAStrans Webinterface” services one by one on all servers involved in the farm.
After stopping all services, the files were not unlocked (I could read them, but could not move them) and were still occupied by System.
I started to restart all services and after starting them on the last server, I noticed that the json files were missing from the monitor directory.
I don't know after which server and service this happened (next time I'll try to wait more after restarting the service), but the last server was the FFAStrans web interface server and it failed to start the “FFAStrans REST-Service” service immediately - it succeeded only from the fourth time.
When starting the service there was an error (I don't remember the number), after which there were warnings in the event log:
Child process [11576 -\\\192.168.100.231\FF_Install\processors\rest_service.exe /ErrorStdOut] finished with 1
and
The “FFAStrans REST-API Service” service was unexpectedly terminated. This occurred (once): 3.