Correct.
In single socket configurations there's no NUMA Nodes.
that's about right and there are two reasons for it:andrezagato wrote: ↑Wed Mar 01, 2023 5:50 pm I have a friend also using on his company FFAStrans, an after we found out about the ram. He also saw a substantial performance on his encoding after increasing the RAM.
He even tested with different clock speeds and noticed a change in the performance. I will get more information and let you know.
1) CPU lanes
2) Cached frames
The first one is pretty intuitive.
A CPU has a certain number of "lanes" connected to the motherboard via the socket with which it interacts with all other devices, including RAM.
Back in the days there used to be northbridge and southbridge, then it became PCH (Platform Controller Hub) and nowadays it's UPI (Ultra Path Interconnect).
Anyway, regardless of the name, the concept is that a CPU has x number of lanes connected to the motherboard and of course the motherboard can use those lanes to connect it to other devices like RAM, SSD, GPU etc and each one of those use x number of lanes.
This is mainly the reason why CPUs like AMD Epyc and Intel Xeon have more lanes compared to their consumer counterparts like AMD Ryzen and Intel i9.
Having more lanes means that you can have more connections and therefore better speed.
For instance, if you have like a program that uses CUDA and you have an NVIDIA Card and you add a second NVIDIA GPU in SLI and you have a consumer CPU, you won't probably gain much 'cause you wouldn't have enough lanes to communicate with both GPUs effectively anyway.
Now, going back to your use case, encoding, what happened here regarding the "first" point is that your CPU had enough lanes available, however you only had 1 single RAM slot, therefore you were able to allocate memory and read from it far too slowly. However, when you added the additional RAM in the other slots, you had more lanes available and the OS used them all at the same time to allocate and de-allocate memory, thus making encoding faster.
Now let's go to the second point.
When you encode a file, this can be of any resolution, framerate, bit depth etc and it could be very different from the output you're targeting.
In FFAStrans there's something very complex called "filter_builder.a3x" which will use the info from both ffprobe and mediainfo and use them to create the perfect filter-chain to reach your output.
Just like Avisynth, FFMpeg also has its own "frameserver" as it "glues together" a series of decoders, filters and encoders.
For instance, in the case of AVC Intra, it means that your file will be decoded by libav, it will go through a series of filters and the uncompressed a/v stream (which lives in your ram) will be passed to the encoder, x264 (in this case libx264 bundled inside ffmpeg) which will encode it and create the raw_video.h264 which the ffmpeg muxer (or the BBC Muxer) will mux in .mxf or whatever container you choose on the fly (meaning that you won't see two files, just one).
Given that some filters may require spatial/temporal access, frames will be cached inside RAM with malloc() and then accessed only to be discarded when they're not needed any longer, so you can see how RAM is important.
Not just that, in case of multiple filters, RAM can and will be distributed across modules so that access is faster per filter, but of course this can't be done in an intra-filter basis as it needs to be contiguous at least within the same filter in the threadpool (it's a bit more complicated than that but I won't elaborate now, maybe later if you're interested ).
Disclaimer: I know very little about FFMpeg's threadpool so a lot of what I wrote above is assumed from my knowledge of the Avisynth threadpool as I expect the two to work in a very similar manner, which would explain the speed bump you've got.
p.s if you guys are interested in the inner working of Avisynth and its thread pools, I can go on and further elaborate on that. I could go on for hours xD
Cheers,
Frank