Value Arena - Hackathon Project Story

Inspiration

After a breakup, I found myself watching Alpha Arena—a platform where AI models make thousands of high-frequency trades with leverage and derivatives. While impressive, it hit me: this isn't how regular people invest. I have a job. I can't monitor 20 screens. I can't execute 500 trades before lunch.

So I asked myself: What if we forced AI to invest like a retail investor? Five trades per month. Thirty-day minimum holds. No leverage, no derivatives, no shortcuts. Could AI still beat the market with patience instead of speed?

That question became Value Arena—a platform where the world's leading AI models (Claude, GPT-4, Gemini, Grok, DeepSeek, Qwen) compete as value investors, not day traders. It's therapy, research, and education rolled into one.

What it does

Value Arena is an AI value investing competition platform with complete transparency:

Core Features:

Live Performance Dashboard: Track 6 AI models' portfolios in real-time with growth projection charts
Portfolio Holdings: See every stock position, entry price, and whether it's long-term (30+ day hold) or short-term
Transaction Reasoning: Every buy/sell comes with the AI's complete thought process—market context, risk assessment, conviction level
Daily Market Reviews: Each AI publishes daily analysis of news, market conditions, and their strategic positioning
Individual Stock Analysis:
- Weekly: Deep fundamental analysis combining earnings reports and competitive positioning
- Daily: Real-time reactions to breaking news and price movements
Market News Feed: Auto-tagged headlines with all 6 AIs' impact predictions (short-term, long-term)

Strict Retail Investor Rules:

$100,000 virtual capital (70% long-term, 30% short-term)
5 trades per month maximum
30-day minimum hold for long-term positions
22 stocks only (no derivatives, options, shorting, or leverage)
Same public data everyone can access (RSS feeds, SEC filings, Yahoo Finance)

Educational Value:

Users can learn from AI reasoning, compare strategies, understand why patience matters, and build conviction for their own investments.

How we built it

Tech Stack:

Backend: AWS Lambda (serverless architecture), PostgreSQL (main database), Redis (real-time price caching), DynamoDB (historical data)
AI Infrastructure:
- AWS Bedrock (Claude, multiple LLM support)
- OpenAI API (GPT-4)
- Google Gemini API
- xAI Grok API
- DeepSeek API
- Alibaba Qwen API
RAG System:
- AWS Bedrock Knowledge Base
- OpenSearch Serverless (vector database)
- Titan Embeddings V2 (1024-dimensional vectors)
Frontend: React, TailwindCSS, Recharts (data visualization)
Data Sources: RSS feeds (news), SEC EDGAR API (earnings), Yahoo Finance API (prices)

Architecture Highlights:

Token Optimization: Compressed conversation history from 50,000+ tokens to 7,200 tokens using structured state management (95% cost reduction)
RAG Memory System: Every decision stored as embeddings, retrieved before trading to help AIs learn from past mistakes
Quality-Weighted Retrieval: After 30 days, decisions are evaluated and high-quality ones weighted higher in future retrievals
Dual Account System: Separate long-term (70%) and short-term (30%) accounts enforce patience while allowing flexibility
Compliance Engine: 6 automatic rules validate every trade (stock pool, trade quota, wash trade period, wallet balance, etc.)

Development Process:

Phase 1: Built core database schema and Lambda functions
Phase 2: Integrated 6 different AI APIs with unified prompt engineering
Phase 3: Implemented RAG system for AI memory across conversations
Phase 4: Developed frontend dashboard with real-time updates
Phase 5: Added news ingestion pipeline and auto-tagging system
Phase 6: Deployed to production with monitoring and logging

Challenges we ran into

1. Token Cost Explosion

Initial implementation saved full conversation history for each AI. After 30 days, context windows hit 50,000+ tokens per decision, costing $20+ per trade.

Solution: Built structured state management—compress portfolio state to 700 tokens, keep only the last 20 key events (~2,000 tokens), and use RAG to retrieve relevant historical context (~3,000 tokens). Total: ~7,200 tokens (95% reduction).

2. AI Rate Limits

Running 6 AIs concurrently hit rate limits on all APIs, especially during daily decision windows.

Solution: Sequential execution with exponential backoff retry logic. Added rate limit tracking and graceful degradation (if one AI fails, others continue).

3. Wash Trade Detection

Complex logic: holding period calculated from first buy date, not most recent purchase. Selling partial positions allowed, but must respect 30-day lock.

Solution: Database design with first_buy_date field separate from last_transaction_date. Compliance engine checks this before every sell order.

4. Multi-API Prompt Consistency

Each AI API has different input formats, token limits, and response structures. Claude uses XML tags well, GPT prefers JSON, Gemini has shorter context windows.

Solution: Built abstraction layer with API-specific adapters. Unified prompt template with conditional formatting based on target model.

5. News Relevance Filtering

Initial version ingested all financial news—90% was noise (e.g., "Paraguay cuts rates" irrelevant to US equities).

Solution: Implemented GPT-4 classification layer to tag news with related stocks BEFORE storing. Only index high-quality, relevant articles in RAG.

6. OpenSearch Vector Index Management

Embedding dimensionality mismatches, index mapping errors, and bulk indexing failures caused data loss.

Solution: Strict schema validation before indexing, retry logic with exponential backoff, separate indices per data type (decisions, reviews, news).

7. Frontend Real-Time Updates

Polling database every second for 6 AIs × 22 stocks = high load.

Solution: Redis caching layer for frequently accessed data (current prices, portfolio values). Cache invalidation on transaction events.

Accomplishments that we're proud of

1. Complete Transparency

Every decision is documented and explained. Unlike black-box trading bots, users can read exactly what the AI was thinking, what data it saw, and why it acted.

2. RAG System That Actually Works

AIs retrieve relevant past decisions before trading. We've seen Claude reference "last time I bought NVDA at the peak" in its reasoning—the memory system works!

3. Enforcing Patience

The 30-day lock and 5-trade limit genuinely force AIs to think long-term. We've observed AIs holding 70%+ cash when uncertain, rather than forcing trades—exactly the discipline retail investors need.

4. Multi-Model Comparison

Running 6 different AIs reveals fascinating strategy differences:

Claude: Ultra-conservative, 75% cash, waits for perfect entries
GPT-4: Balanced, spreads risk across sectors
Gemini: Data-driven, avoids stocks with questionable metrics
Qwen: Technical analysis focus, watches support levels
DeepSeek: Contrarian plays, buys dips aggressively
Grok: Macro-focused, heavy VIX monitoring

5. Token Optimization

95% cost reduction while maintaining decision quality is a technical achievement we're genuinely proud of. Makes the project sustainable long-term.

6. Educational Impact

Early testers say they learned more about value investing from reading AI reasoning than from traditional investing courses. That's the mission accomplished.

What we learned

Technical Lessons:

Context window management is critical for long-running AI agents. Structured state beats conversation history.
RAG retrieval quality > RAG retrieval quantity. Better to retrieve 5 highly relevant examples than 20 mediocre ones.
API reliability varies wildly. Always build retry logic and fallback mechanisms.
Database design matters for compliance. Tracking first_buy_date separately saved us from wash trade bugs.
Redis is your friend for high-read, low-write data (stock prices, portfolio snapshots).

AI Behavior Insights:

AIs develop "personalities" even with identical prompts. Claude is cautious, DeepSeek is aggressive—deterministic models produce consistent behavioral patterns.
Reasoning quality improves with constraints. Forcing AIs to explain decisions before executing leads to more thoughtful trades.
AIs are bad at panic. When VIX spikes, AIs calmly hold positions while humans would sell. This might be their biggest advantage.
Memory helps, but not always. Sometimes AIs over-index on past failures and miss new opportunities. Balance is key.

Product Lessons:

Transparency builds trust. Users love seeing AI "show its work."
Constraints make it relatable. "5 trades/month" resonates because it's achievable for regular people.
Real-time updates are addictive. People check back daily to see AI decisions.
Open source attracts contributors. GitHub stars are climbing, devs are submitting PRs.

Personal Lessons:

Code won't leave you (unlike some people 😅).
Side projects can become real products if they solve genuine problems.
Building in public creates accountability and accelerates improvement.

What's next for Value Arena

Short-Term (Next 3 Months):

Performance Leaderboard: Public-facing rankings, historical performance charts, risk-adjusted returns (Sharpe ratio)
Weekly Reports: Auto-generated performance summaries with strategy breakdowns
User Portfolios: Let users create their own portfolios and compare against AIs
Mobile App: iOS/Android apps for on-the-go monitoring
Email Notifications: Daily summaries, trade alerts, weekly performance reports

Mid-Term (3-6 Months):

Expand Stock Pool: Add 50+ more stocks based on community voting
Multi-Market Support: A-shares (China), HK stocks, European equities
AI vs Human Competition: Let human investors compete against AIs with same rules
Strategy Templates: Package successful AI strategies as templates users can copy
API Access: Let developers query AI decisions programmatically
Discord/Community: Build a community of AI investing enthusiasts

Long-Term (6-12 Months):

AI Customization: Let users fine-tune AI prompts and risk preferences
Paper Trading Integration: Connect to real brokerage accounts for paper trading
Educational Courses: "Learn Value Investing from AI" video series
Research Papers: Publish findings on AI decision-making under constraints
Enterprise Version: Hedge funds/institutions using Value Arena for AI strategy research
DAO Governance: Community votes on rules, stocks, features

Moonshot Ideas:

Value Arena Season 2: New rules, new AIs (Llama, Mistral, Perplexity), bigger prize pool
Real Money Competition: $10K prize pool, top AI gets donated to AI safety research
Value Arena Academy: Full investing curriculum taught by AI models
Decentralized Version: On-chain transparency, smart contract enforcement of rules