AetherAI

the ui looks this way
the text to image
processing takes place
generated
image-text content generation
content generated
another example, for sppech content from image upload

Inspiration

    In today’s world, content creation is everywhere , from social media posts, marketing materials, banners, flyers, to professional presentations. While there are AI tools that generate images or captions, none of them provide structured options tailored to specific output types, such as Banner, Poster, Pamphlet, Flyer, or Social Media posts. We wanted to build a system that goes beyond generic image generation and allows users to choose the style and type of the visual, while also extracting meaningful content from images for blogs, marketing, or social media. This flexibility makes content creation faster, more precise, and accessible for individuals and small teams without professional designers.

What it does

Our project is a full-stack AI application with two main features:

Text-to-Visual Generation Users provide a text prompt and select a type: Banner, Poster, Pamphlet, Flyer, or Social Media. The system generates a high-quality, professionally styled image tailored to the selected type using Hugging Face’s GPT-OSS / Stable Diffusion API. Uniqueness: Unlike other tools, our application adapts the prompt to the chosen visual format (e.g., wide horizontal banner vs. vertical poster), ensuring layout, resolution, and style are appropriate for the intended use.
Image-to-Content Generation Users upload an image and select a content type: Social Media, Blog, Product Description, Speech, Marketing, or Email. Our backend processes the image and generates contextual content using API. This helps businesses, content creators, and marketers instantly produce captions, posts, product descriptions, or speech points without manual brainstorming.

How we built it

Frontend: Angular, Tailwind, FormsModule, HttpClient , interactive UI for prompt entry, file upload, and real-time display of outputs. Backend: Node.js, Express.js, Multer for file uploads, Hugging Face GPT-OSS for text and image processing. AI Models: Hugging Face GPT-OSS (20B / 120B) for image generation and the API key for text generation. Other Tools: Base64 image encoding, REST APIs, CORS, environment variables for API keys.

Challenges we ran into

API Integration: Integrating GPT-OSS for image-to-text content required careful handling of prompts and base64 images. Dynamic Prompt Engineering: Designing prompts for different types (banner, poster, flyer) to produce high-quality visuals required experimentation. Error Handling: Managing server errors, invalid uploads, and API limits was crucial for a smooth user experience. Frontend-Backend Sync: Ensuring real-time updates and progress indication for both image and content generation taught us advanced Angular state management. Through this project, we deepened our understanding of AI model usage, REST APIs, file handling, and prompt engineering.

Accomplishments that we're proud of

Type-Specific Output: Users can choose exact visual types (Banner, Poster, Pamphlet, Flyer, Social Media), which is not available in most existing AI tools. Multi-Modal Functionality: Combines text-to-image and image-to-text generation in one application. Real-Time Feedback: Interactive frontend showing generated images and content with clear error messages.

Customizable Prompts: Adapts AI generation according to the selected type for professional results.

What we learned

We learned how to design effective prompts to get high-quality outputs for different formats like banners, posters, and flyers. Integrating text-to-image and image-to-text models taught us how to handle both visual and textual data efficiently. We also gained experience with frontend-backend communication, file uploads, API error handling, and creating a smooth, interactive user experience.

What's next for AetherAI

Video Generation: Extend the platform to produce animated banners or short clips for social media. 3D Content Creation: Integrate 3D model generation from text prompts for product visualization or AR/VR applications. Editing Tools: Allow users to edit AI-generated images, add text overlays, or refine layouts. Multi-Language Support: Generate content in multiple languages for global reach. Offline Local Agent: Package GPT-OSS locally for offline usage in areas with low connectivity.

Built With

angular.js
api
env
express.js
gpt
huggingface
node.js
python
rest
tailwind

Updates

MANJU SRI V AIDS started this project — Sep 11, 2025 02:42 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.