AI Story & Conversation Maker with TTS Integration
Introduction
Imagine a tool where you can effortlessly create lifelike conversations and immersive stories with the magic of AI-driven text-to-speech. Meet AI Story & Conversation Maker, a Streamlit-powered web application designed to generate multi-voice audio content using ElevenLabs’ TTS API. Whether you’re a storyteller, a developer, or a creative looking for unique audio content, this tool simplifies the process of crafting, managing, and listening to engaging conversations with multiple characters. Let’s explore how this application can revolutionize audio content creation!
What Is the AI Story & Conversation Maker?
The AI Story & Conversation Maker allows you to generate high-quality, multi-character audio stories and conversations by integrating with ElevenLabs’ Text-to-Speech API. It provides a user-friendly interface to manage stories, assign voices to characters, generate audio line-by-line or in full, and organize everything seamlessly. Whether you’re working on an audiobook, an interactive game, or a storytelling podcast, this tool ensures you can create professional-quality audio effortlessly.
Key Features
1. Story Management
- Create and Manage Stories: Start fresh or load existing stories.
- Save Progress: Keep track of dialogue, audio files, and metadata.
- Organized Output: Audio files and metadata are neatly stored for easy access.
- Story Compilation: Combine individual dialogue lines into a complete, downloadable audio file.
2. Voice Management
- Character Voice Aliasing: Assign unique names to voices for better organization.
- Flexible Voice Selection: Supports all voices available in ElevenLabs’ API.
- Customizable Settings: Adjust voice parameters like stability, similarity boost, and style for every dialogue line.
- Real-time Updates: Refresh voice lists directly from the ElevenLabs API.
3. Audio Generation
- Line-by-Line Audio Creation: Generate audio for individual lines of dialogue.
- Full Story Compilation: Combine all generated lines into a single audio file.
- Preview & Download: Listen to audio files directly in the app and download them.
- Timestamps: Automatically generate a JSON file with timestamps for each audio segment.
4. Intuitive User Interface
- Easy Dialogue Editing: Add rows for dialogue, assign voices, and tweak settings.
- Character & Voice Selection: Pick voices for each line quickly.
- Voice Settings Adjustment: Customize parameters like stability and style directly within the editor.
- Story History: View and access your saved stories seamlessly.
How to Get Started
Prerequisites
Before running the application, ensure you have the following:
- Python 3.7+
- ElevenLabs API key
- Git (to clone the repository)
Installation
- Clone the Repository
bash
git clone https://github.com/AnandBhandari1/ai_tts_story_conversation_creator cd ai_tts_story_conversation_creator
- Install Dependencies
bash
pip install -r requirements.txt
- Set Up ElevenLabs API Key Get your API key from ElevenLabs and set it as an environment variable:For Windows:
bash
set ELEVENLABS_API_KEY=your-api-key-here
For Linux/Mac:
bashexport ELEVENLABS_API_KEY=your-api-key-here
Usage
- Run the Application Start the web app with Streamlit:
bash
streamlit run src/app.py
- Create a Story
- Enter a story title in the sidebar and click “Create Story”.
- Set Up Character Voices
- Use the Voice Management section to map characters to specific voices from ElevenLabs.
- Assign aliases for better character identification.
- Add Dialogue
- Add rows for each line of dialogue.
- Assign a character/voice and input the text.
- Adjust voice settings like stability, style, and similarity boost.
- Generate Audio
- Click the 🔊 button to generate audio for individual lines.
- Use the “Generate Full Story” button to combine all lines into one audio file.
- Download Files
- Download individual lines, the full story, or timestamps in JSON format.
Voice Customization
The tool supports advanced voice settings to fine-tune your output:
- Stability (0.0 – 1.0): Controls how consistent the voice sounds.
- Default: 0.50
- Similarity Boost (0.0 – 1.0): Adjusts how closely the voice matches the sample.
- Default: 0.75
- Style (0.0 – 1.0): Influences the intensity of speaking style.
- Default: 0.0
- Speaker Boost: Enhances voice clarity.
- Default: True
Future Enhancements (TODO)
- AI-Based Story Generation: Integrate Gemini API or other AI models to auto-generate story content.
- Improved User Experience: Add more voice effects and interactive editing features.
Why Use This Tool?
- Fast and Efficient: Automates audio creation with minimal effort.
- Highly Customizable: Control voices, settings, and character dialogues.
- Organized Workflow: Stories, audio files, and metadata are automatically structured.
- Free and Open-Source: Modify or expand the tool to suit your needs.
License
This project is licensed under the MIT License.
Acknowledgments
Special thanks to:
- ElevenLabs for their powerful text-to-speech API.
- Streamlit for making web applications simple and interactive.
- Pydub for audio processing.
Support
Encounter an issue? Open an issue in the repository on GitHub: GitHub Repository
With this tool, you can bring your stories to life, create engaging character conversations, and explore the endless possibilities of AI-powered audio. Whether for content creation, storytelling, or entertainment, the AI Story & Conversation Maker is your go-to solution
Leave a Reply