Sibyl and Genie

Later, the text2video’s AI Avatar part was refactored into Sibyl (backend) and Genie (frontend) and turned into a product with many improvements, new features and revamped video design.

The project was tracked here.

Multiple News organizations showed interest in the product, so dedicated customizations per organization was made into the product (hard coded).

Tech

Some tech from the text2video project above.
Sarvam AI for text-to-speech and translations.
Anthropic Claude for text generation.
ULCA for translations.

Experiments

Auto-generation workflow

Auto generate summaries for each new articles and post them to slack for human review.
After review, generate short videos and post them to slack for final review and posting to social media.

Model comparisons

Ran experiments with multiple service providers and local models for summarization, text-to-speech, speech-to-subtitles, translations etc.
Tried multiple regional languages.

Prompt engineering

Experiments to optimize the summarize prompt.
Ran Slack experiments with multiple article formats like listicles, atoms, FAQs etc.

As a best practice, always add the article publish date and current date (with weekday) along with the article in the prompt. This helps the model not get confused with the tenses and dates mentioned in the article.

Storing human reviews for future training

It’s best to store the human reviews in a structured format for future training of the model. This can be used to fine-tune the model or to create a feedback loop for continuous improvement.

Pronunciation Glossary

For accurate pronunciation of uncommon words, it’s best to feed a modified version of the text with commonly mispronounced words replaced with their phonetic pronunciation. This involves human review to identify the modify the mispronounced words. Same should be stored for future training/tweaking of the model/model params.

Scroll AI Experiments