Sibyl and Genie
Later, the text2video’s AI Avatar part was refactored into Sibyl (backend) and Genie (frontend) and turned into a product with many improvements, new features and revamped video design.
The project was tracked here.
Multiple News organizations showed interest in the product, so dedicated customizations per organization was made into the product (hard coded).
Tech
- Some tech from the text2video project above.
- Sarvam AI for text-to-speech and translations.
- Anthropic Claude for text generation.
- ULCA for translations.
Experiments
Auto-generation workflow
- Auto generate summaries for each new articles and post them to slack for human review.
- After review, generate short videos and post them to slack for final review and posting to social media.
Model comparisons
- Ran experiments with multiple service providers and local models for summarization, text-to-speech, speech-to-subtitles, translations etc.
- Tried multiple regional languages.
Prompt engineering
- Experiments to optimize the summarize prompt.
- Ran Slack experiments with multiple article formats like listicles, atoms, FAQs etc.
Learnings
Dates in prompts
As a best practice, always add the article publish date and current date (with weekday) along with the article in the prompt. This helps the model not get confused with the tenses and dates mentioned in the article.
Storing human reviews for future training
It’s best to store the human reviews in a structured format for future training of the model. This can be used to fine-tune the model or to create a feedback loop for continuous improvement.
Pronunciation Glossary
For accurate pronunciation of uncommon words, it’s best to feed a modified version of the text with commonly mispronounced words replaced with their phonetic pronunciation. This involves human review to identify the modify the mispronounced words. Same should be stored for future training/tweaking of the model/model params.