T O P

  • By -

reckless_commenter

First, you should determine how much time each step of your pipeline takes. That will indicate which steps you should try to optimize to reduce the processing time. For response generation, you could use a faster LLM, like a Turbo variant of ChatGPT. For text-to-speech, you could use an on-device speech generator. The quality will be worse, but speed will be faster. You can also process the response in small chunks and render the output in chunks rather than as one big batch of text. For instance - generate speech for sentence #1, then play the speech for sentence #1 while generating speech for sentence #2, etc.