Re: OpenAi whisper transcription workflow.
Posted: Fri Aug 04, 2023 4:25 pm
Hey @taner
yeah it takes a while until one gets familiar with the concepts of python packaging...
faster-whisper seems to be the better choice mainly because they follow the development of original whisper project so you get best of both worlds. As far as i see they sometimes still underly the repeat forever issue but i feel it is only a matter of time until all manifestations of this error are dealt with. What i am not 100% sure about is if the reduced vram of faster-whisper comes from them quantisizing the model which means they could never reach the accuracy of the original model. (e.g. if they used the original model, output would be even better)
Const-Me in contrast "only" shows off how to use the ggml format models very performant with windows but apart from that which is VERY interesting for developers of windows apps that want a native API but he doesnt seem to want to follow the development of the others, nor is he very responsive when others send code to his repository, so even i turn away from experimenting with it. It does not pay off for me to dive much deeper into Const's version because it needs to much trickery to run on linux and i always prefer cross OS compatibility when possible.
However, the repeat forever stuff was relatively easy to overcome for me, the concept could be applied in all other whisper projects too:
https://github.com/Const-me/Whisper/issues/26
The thing is, it is not a very good idea to recover on error (as i do) but we must find ways how prevent errors up in front when talking to the model. Lots of research is done around whisper and i am sure it is just a matter of time until they figure it out Question is if OpenAI will share their results or stop sharing at some point in time because they recognize that others provide cheaper and better whisper cloud services...
yeah it takes a while until one gets familiar with the concepts of python packaging...
faster-whisper seems to be the better choice mainly because they follow the development of original whisper project so you get best of both worlds. As far as i see they sometimes still underly the repeat forever issue but i feel it is only a matter of time until all manifestations of this error are dealt with. What i am not 100% sure about is if the reduced vram of faster-whisper comes from them quantisizing the model which means they could never reach the accuracy of the original model. (e.g. if they used the original model, output would be even better)
Const-Me in contrast "only" shows off how to use the ggml format models very performant with windows but apart from that which is VERY interesting for developers of windows apps that want a native API but he doesnt seem to want to follow the development of the others, nor is he very responsive when others send code to his repository, so even i turn away from experimenting with it. It does not pay off for me to dive much deeper into Const's version because it needs to much trickery to run on linux and i always prefer cross OS compatibility when possible.
However, the repeat forever stuff was relatively easy to overcome for me, the concept could be applied in all other whisper projects too:
https://github.com/Const-me/Whisper/issues/26
The thing is, it is not a very good idea to recover on error (as i do) but we must find ways how prevent errors up in front when talking to the model. Lots of research is done around whisper and i am sure it is just a matter of time until they figure it out Question is if OpenAI will share their results or stop sharing at some point in time because they recognize that others provide cheaper and better whisper cloud services...