Social Draft

Fast inline replies for quick message completion
Text recommendations in three different tones
Users can choose the model and decide whether to use the default or personalized LoRA
On-device training with parameters and outputs
Text recommendations for difficult social conversations

Inspiration

Social Draft started from a simple personal problem: sometimes we know what we want to say, but not how to say it naturally. In everyday texting, the hard part is often not grammar, but tone — sounding friendly, direct, thoughtful, or casual without sounding too cold, too intense, or too much like AI. After testing small language models on mobile, we saw both the potential and the limitations of local inference. Then we found experimental tools for on-device fine-tuning, which pushed the idea further: what if users could not only run a model locally, but also personalize it locally? That became the core idea of Social Draft: a private, on-device reply assistant that helps users write in their own voice.

What it does

Social Draft helps users generate natural replies from short conversation context. The user chooses an intention or tone, such as direct, friendly, or thoughtful, and the app suggests replies that they can edit or send. We are not trying to build another messenger. Instead, Social Draft is a local writing engine that can eventually work across messaging apps through a standalone app, Share Sheet, Shortcuts, or an optional keyboard extension. The goal is not to replace the user’s voice. The goal is to help users express what they already mean.

How we built it

We tested multiple small models, including Qwen, Llama 3.2, Gemma, and TinyLlama-style models, to compare reply quality, latency, and mobile feasibility. Based on our experiments, Llama 3.2 3B gave us the best balance between social understanding and local deployment. We also built a distillation pipeline and generated around 2,000 social-reply samples using Claude. The data was designed to avoid common AI-writing problems: being too formal, too long, too polished, or too emotionally over-explained. On the model side, we explored supervised fine-tuning and LoRA adaptation. The model is trained around a simple structure: short conversation context + user intention + tone -> natural reply. For deployment, we built an iOS prototype with a Swift frontend and local LLM inference using quantized models. This keeps private conversation data on device instead of sending it to a cloud API. The most experimental part is on-device LoRA training. Instead of retraining the full model, we freeze the base model and train a small adapter from user-approved examples. In the future, when a user edits or selects a reply, that can serve as local training data for their personal model.

Challenges we ran into

The biggest challenge was making the project more than a generic AI reply app. We had to define Social Draft as a private, personalized writing layer rather than another chatbot or messenger. Data quality was also difficult. Public dialogue datasets often felt too generic, while AI-generated replies easily became too polished or unnatural. We had to design a style rubric focused on short, casual, human-like texting. On the technical side, local deployment created real constraints around model size, latency, memory, iOS integration, LoRA conversion, and runtime compatibility. On-device training was even harder because training requires much more memory and computation than inference.

Accomplishments that we're proud of

We are proud that Social Draft is not just a thin wrapper around a cloud API. We built toward a real local first system: benchmarking small models, creating a social-reply dataset, experimenting with SFT and LoRA, and deploying local inference in an iOS prototype. We are also proud of pushing into on-device personalization. Even though it is still experimental, exploring local LoRA training made the project much more ambitious than a normal reply assistant. It forced us to think about what it means for users to own the model that learns from their private data. Most importantly, we are proud that the project connects a real human problem with a serious technical direction: private, user-owned personal AI.

What we learned

We learned that small local models can be useful when the task is carefully constrained. Instead of asking the model to be a general chatbot, we gave it a focused job: generate short, socially aware replies from short context and clear tone labels. We also learned that personalization is not just a feature. It is about ownership. In most AI products, user data improves a company-owned cloud model. In Social Draft, the goal is for user-approved data to improve the user’s own local adapter.

What's next for Social Draft

Next, we want to improve the quality and consistency of the reply model by manually filtering our distilled dataset and running more controlled SFT experiments. We also want to make personalization more visible. A strong next demo would show a clear before-and-after difference between the base model and a user-specific LoRA adapter. On the product side, we want to reduce friction by exploring Share Sheet, Shortcuts, or an optional keyboard extension, so Social Draft can work more naturally across existing messaging apps. Long term, we want Social Draft to become a private personal writing layer: a local model that keeps your messages private, learns your style with your permission, and helps you write more like yourself.

Built With

gguf
github
huggingface
ios
json
llama
llama.cpp
lora
python
pytorch
qvac
swift
swiftui
transformer
xcode

Updates

MingxuanGao661 Gao started this project — Apr 26, 2026 04:14 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.