Log DB folder "def_log" getting huge

Questions and answers on how to get the most out of FFAStrans
Den
Posts: 10
Joined: Wed Mar 04, 2020 7:06 pm

Log DB folder "def_log" getting huge

Post by Den »

Hi everyone,

I've recently noticed that the \db\def_log folder is growing excessively large, containing over 50,000 files and occupying more than 80 GB. This folder seems to be constantly receiving new files - except when all the processes are stopped. Below is an example of the output from one of these files. Is this accumulation of files normal behavior for our system? Additionally, what steps can I take to prevent this folder from becoming so large?

2023-11-29T17:41:33.222+01:00, started from: \\cent-transcode\transcoding\ffastrans\applikation\processors\db\configs\workflows\20170621133000.json
2023-11-29T17:41:33.222+01:00, cache path: "\\cent-transcode\transcoding\ffastrans\applikation\processors\db\cache"
2023-11-29T17:41:33.222+01:00, started with args: /AutoIt3ExecuteScript "\\cent-transcode\transcoding\ffastrans\applikation\processors\def_runner.a3x" /ErrorStdOut -sys "\\cent-transcode\transcoding\ffastrans\applikation\processors\db\configs\workflows\20170621133000.json" 9880
2023-11-29T17:41:33.289+01:00, parent pid is : 9880
2023-11-29T17:41:33.289+01:00, sleep timer is : 30000
2023-11-29T17:41:33.314+01:00, enum: "T:\Transcoding\Transcoding_Alpha\MP4\MP4_Stream\Input" @ 20170621-133708-043-C12BBB5C37C8
2023-11-29T17:41:48.622+01:00, enum: "T:\Transcoding\Transcoding_Alpha\MP4\MP4_Stream\Input" @ 20170621-133708-043-C12BBB5C37C8
2023-11-29T17:42:00.859+01:00, workflow stopped. Exiting....

I apologize if this question has already been addressed elsewhere, I couldn't find any related information. Thank you in advance for your help.
Regards
emcodem
Posts: 1812
Joined: Wed Sep 19, 2018 8:11 am

Re: Log DB folder "def_log" getting huge

Post by emcodem »

Hm if you have that many files in def_log there is something wrong i would i would assume. Watch the file names, they consist of:
hostname_wf_id_YYYY_mm_DD_PID

Where PID is the current PID of the watching process (one of many exe_manager.exe). The PID should not really change if you don't stop and start the watchfolder.
So if everything goes normal, you should see one file per day per workflow. Oldest log file should by default be 14 days (as per max_logs_age setting in
/db/configs/ffastrans.json.

As you see in your log, it states that the workflow was started and stopped within the same minute - if this is what happens for you, it explains why you have many files because the PID would change every minute.
Do you know the reason why the watches are started and stopped every minute, e.g. is it something you do intentionally?
emcodem, wrapping since 2009 you got the rhyme?
Den
Posts: 10
Joined: Wed Mar 04, 2020 7:06 pm

Re: Log DB folder "def_log" getting huge

Post by Den »

Hi! Yesterday, I emptied the folder, but today it already contains over 1240 files. I'm uncertain about why the watches are repeatedly starting and stopping every minute. Our system has approximately 270 workflows, and each one involves multiple watches used by over 100 users who are constantly transcoding something new. I'm considering deleting everything and performing a new installation, of course, after creating a backup. Could you advise on where I might adjust the frequency at which the watches are triggered?

Thank you so much for your response.
emcodem
Posts: 1812
Joined: Wed Sep 19, 2018 8:11 am

Re: Log DB folder "def_log" getting huge

Post by emcodem »

The frequency setting is in workflow properties -> sleep timer. But this is not about starting/stopping workflows, it is just about how often the folder is scanned. Workflows are basically only started/stopped either manually or via API.

I don't really think it will pay off for you to recreate the installation, we should concentrate to try and find the issue.
Is it possible that it is only about a single workflow that freaks out or or do you see one new log file every minute for each and every workflow_id?

Also, basically the log message we see:
2023-11-29T17:42:00.859+01:00, workflow stopped. Exiting....

This is logged only if one of 2 conditions is fulfilled:
  • processors\db\cache\status folder does exist
  • but there was no file in this folder named ~start_WF_ID
Can you watch the processors\db\status directory and tell us what is happening there? Basically you should just see one ~start file per workflow and nothing else should happen in this folder, nothing should appear/disappear whatsoever.

Also, can you sent cache\exe_log folder zipped? or at least the files from one day...
emcodem, wrapping since 2009 you got the rhyme?
Den
Posts: 10
Joined: Wed Mar 04, 2020 7:06 pm

Re: Log DB folder "def_log" getting huge

Post by Den »

Hi!

I've made a screenshot of the processors\db\status directory. The number of items inside the folder remains unchanged. There are 244 items in the folder, whereas we have around 231 workflows. Please find the attached zip file + screenshot of the folder as you requested. Thanks!

https://we.tl/t-YBlPG7h4qv

Regards
emcodem
Posts: 1812
Joined: Wed Sep 19, 2018 8:11 am

Re: Log DB folder "def_log" getting huge

Post by emcodem »

OK so after some discussion with @admin, we strongly suspect actually the storage to be the problem source (\\cent-transcode\transcoding\ffastrans\).
It appears that we do not always get the list of files correctly from it. This does not only apply to the "processors\db\cache\status" directory (the ~start.. files") but also we see that we constantly detect changed workflows in the folder "db\configs\workflows".
E.g. in the exe manager logs you sent, we se a line like this every hour 4-5 times (with some rare exceptions of some hours e.g. 1-3 in the morning):

Code: Select all

2023-11-17T00:03:52.551+01:00, exclude workflow with id: "20170621143133" from this host.
This log line should only appear if one of the workflows was changed and saved by you. The fact that we sometimes DO NOT see such lines at 1-3 in the morning could potentially point in direction of the storage having troubles when it's under load but it might be totally wrong here.
Also we see that on 22-23.11.2023 one of the 2 machines (Trans04) was turned off (or did not run ffastrans at least). On these days, the other machine shows a lower count of "exclude workflow with id" occurences. Again this could be a hint in direction of load problems of your storage.

It is not 100% clear if your storage returns an empty list of files in these cases or different file dates or similar but i think we can safely say that it does not return the actual content of the folders.

Might i ask what type of storage this is? The hostname "cent-transcode" implies that it is not some huge storage like Netapp/isilon or similar. Is it some linux samba server or similar?

Also, if you want to dig deeper, i wrote a program that writes a log about changes in directories, using the exact same file listing technique that ffastrans uses (rename from .txt to .exe):
modifiedlist.exe.txt
(954.5 KiB) Downloaded 176 times
IF you are going to use this, then execute the modifiedlist.exe, it should bring up a GUI where you can enter a folder to watch. You should enter one folder of the ffastrans db there, either "\\cent-transcode\transcoding\ffastrans\processors\db\status" or "\\cent-transcode\transcoding\ffastrans\db\configs\workflows". Then leave it running over night or similar.
The tool will list the folders contents every second and if there was a change detected to the last second, it will log a line to a fixed log file C:\temp\diff.txt.
You can test if it works by e.g. setting the watchfolder to "\\cent-transcode\transcoding\ffastrans\processors\db\cache\status" and then stop one workflow. As a result you should see one line in c:\temp\diff.txt becaus a change of files was detected. BUT you should basically never see any log line if you don't manually change anything in ffastrans. In case you leave the tool running and you see multiple log entries without you manually stopping/starting workflows, you know that there is something fishy with the file lists that your storage returns for the db\cache\status folder.

Here also the source code of the exe that i sent above, it is only helpful for developers familiar with autoit script language.
modifiedlist.au3.txt
(2.42 KiB) Downloaded 169 times
One final conclusion could be that the best solution for you could be to move the ffastrans files to a different storage. The most tested setup (by steinar) is to serve the files from a dedicated small windows VM that does nothing but share the ffastrans installation and db files.
emcodem, wrapping since 2009 you got the rhyme?
emcodem
Posts: 1812
Joined: Wed Sep 19, 2018 8:11 am

Re: Log DB folder "def_log" getting huge

Post by emcodem »

One more addition, i did some measurement about the number of I/O's ffastrans does on it's DB directory.
The thing i quickly noticed is that my preferred monitor setting "forget missing" will cause a huge number of I/O's in the DB directory in the cases where my watchfolders have thousands of files remaining. (e.g. where i watch huge folder structures but i don't remove the files after processing).
To lower the number of I/O's in the DB directory, you can either uncheck forget missing and instead set the worfklow->maintenance->validate record setting to forget missing files OR you can just make sure not to watch folders with a huge number of files resident.
emcodem, wrapping since 2009 you got the rhyme?
Den
Posts: 10
Joined: Wed Mar 04, 2020 7:06 pm

Re: Log DB folder "def_log" getting huge

Post by Den »

Hi! Thank you so much for investing the time to analyze everything. I appreciate it very much.

The storage is NetApp actually. Cent-Transcode is only a Windows 2019 server with NetApp mapped as a network drive. So, all of the workflows would have a watch node pointing to something like this:

\\192.168.255.1\folder01\folder2\watchfolder

or

T:\folder01\folder2\watchfolder

After I executed the .exe you've kindly sent, the diff.txt file was immediately populated with many lines. Please find it attached.
diff.txt
(44.61 KiB) Downloaded 173 times
Yes, we could potentially move to another Linux NAS (like QNAP or Synology), but that would mean modifying all of the 200+ workflows, each with multiple watch folder. :(

Thanks again, and have a good week!
emcodem
Posts: 1812
Joined: Wed Sep 19, 2018 8:11 am

Re: Log DB folder "def_log" getting huge

Post by emcodem »

Nice, so we know at least a little bit more about whats happening, unfortunately not why it is happening... What we see in diff.txt looks exactly like you would manually stop and start one workflow.

The diff.txt you sent shows 2 changes within one minute (and the initial list of files). We see taht one .start~ file disappears and a few seconds later re-appears with a new date modified:

Disappear:
[Name:.start~20220125-1553-5979-7051-0f271e81877e Date:20231130140354 Size:0]
Appear with new date:
[Name:.start~20220125-1553-5979-7051-0f271e81877e Date:20231204154138 Size:0]

This causes one new log file in the def_log folder because the watching PID changed.

Now, if you sent a diff.txt from a longer period, we could see if it is always about the same workflow or if the files disappear and reappear randomly.
You say you have the DB directory on netapp, this is really not an optimal place for it, at least the netapps that i know are not good with a huge amount of small I/O's in short time. If you have many watches, FFAStrans can potentially cause many thousand I/O's per second.
When considering a move of the ffastrans DB, you should not mix it up with moving the watchfolders. For moving the DB, all you do is uninstall ffastrans service, copy the ffastrans folder to a new location and re-install the service from there. Everything else stays as it is.
emcodem, wrapping since 2009 you got the rhyme?
Den
Posts: 10
Joined: Wed Mar 04, 2020 7:06 pm

Re: Log DB folder "def_log" getting huge

Post by Den »

Ok, got it.

I can send you the differences tomorrow again.

Just a small note, we don't have FFAStrans running as a service at all. I don't remember why, but it didn't work well when we tried running it as a service, so it's running as an application. Is that an issue?
Post Reply