I think we're saying the same thing, just not getting each other xD
If you mean they're both Limited TV Range in Linear BT709 then sure, you can definitely do it without any conversion.
But... just to make them even more clear... a video or an image have three things:
1) A color matrix
2) A color transfer (color curve)
3) A color primaries
We're gonna focus on the first one.
So basically a picture (or a video, those two are interchangeable as a video is made of frames, so pictures) can be in: BT601, BT709, BT2020.
Those three are color matrices and the values are used to represent colors inside; each one has its own characteristic and is employed in certain scenarios.
Now, when it comes to the color space we can have:
1) RGB24
2) YUV 4:4:4 (yv24)
3) YUV 4:2:2 (yv16) and its interleaved version YUY2 but we'll leave that one alone
4) YUV 4:2:0 (yv12)
and technically other things like 4:2:1, 4:1:1, 4:1:0, 3:1:1 and other weird things, but we'll leave them alone by now.
That is the chroma sampling which tells us how many samples of luma are there compared to the sample of chroma.
For instance, in yv24 there are as many sample of luma as sample of chroma, so if we have let's say a FULL HD 1920x1080 frame, we're gonna have the luma at 1920x1080 and the chroma at 1920x1080 as well, so both are full resolution.
Now, since the human eye has more cones than rods, we perceive the luma far more than chroma, so the vast majority of transmissions etc are in yv12, so 4:2:0, therefore we would have 1920x1080 luma but 960x540 chroma.
In RGB Luma and Chroma are always the same size, so RGB is like saying yv24, so 4:4:4, in fact one could easily see the "24" in ConverttoRGB24() as the 24 in Converttoyv24().
Diving further we have Limited Range vs Full Range.
1) Limited TV Range
2) Full PC Range
Inside the signal we can have the representation as Limited TV Range or Full PC Range.
For 8bit a Limited TV Range it means that the signal is gonna go from 0.0 to 0.7V, namely 16-235, while in Full PC Range the signal is gonna go from 0 to 255.
For 10bit is 64-940 for Limited TV Range and 0-1020 for Full PC Range and so on:
This picture I took in Berlin several years ago for instance is in Full Range as you can see in the waveform monitor (top right) that it goes outside the brown region:
This picture a friend of mine took in a church in Italy where he brought his child is instead in Limited TV Range as you can see it's within the brown parenthesis:
So, in the end, to reply to your question, you can put in "Overlay" an image but in order to avoid any conversion, as things currently stands, both the video and the image have to be:
1) Both in BT601 Limited TV Range
2) Both in BT601 Full PC Range
3) Both in BT709 Limited TV Range
4) Both in BT709 Full PC Range
5) Both in BT2020 Limited TV Range
6) Both in BT2020 Full PC Range
About the sampling, it doesn't matter if one is in yv12 and the other in yv16 or one in yv24 and the other in RGB etc, but the important thing is that they're both with the same color matrix/primaries and in the same range, otherwise they're not gonna match.