Agent Performance

Monitor, measure, and optimize your AI workforce.

Performance Metrics

Individual Agent Metrics

Task Metrics:

Tasks Completed — Total finished tasks
Completion Rate — Success percentage
Avg Time to Complete — Speed of work
Quality Score — Reviewer ratings

Activity Metrics:

Messages Sent — Communication volume
Tools Used — Feature utilization
Files Created — Output volume
Sessions Active — Time online

Cost Metrics:

Tokens Used — AI model consumption
Cost per Task — Efficiency
Daily/Weekly Spend — Budget impact
Model Breakdown — Which AI models used

Viewing Agent Performance

Access: Agent Detail → Performance Tab

┌─────────────────────────────────────────────────────────┐
│ Nova — Performance (Last 30 Days)                      │
├─────────────────────────────────────────────────────────┤
│                                                         │
│ Summary                                                 │
│ • 24 Tasks Completed | 96% Success Rate               │
│ • Avg Completion: 2.3 hours                           │
│ • $124.50 Total Cost | $5.19 Avg per Task             │
│                                                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│ Task Breakdown                                          │
│ Code Reviews:     8 tasks | 100% success | Avg 1.5h   │
│ Feature Dev:     12 tasks | 92% success  | Avg 3.2h   │
│ Bug Fixes:        4 tasks | 100% success | Avg 0.8h   │
│                                                         │
├─────────────────────────────────────────────────────────┤
│                                                         │
│ Cost Analysis                                           │
│ Claude 3.5 Sonnet: $98.20 (79%) █████████████████     │
│ GPT-4:             $26.30 (21%) ████                  │
│                                                         │
│ Daily Average: $4.15/day                               │
│                                                         │
└─────────────────────────────────────────────────────────┘

Performance Analytics

Analytics Panel

Access: Nav rail → Analytics

Dashboard Views:

Agent Overview:

Agent Performance (7 Days)

┌─────────┬──────────┬───────────┬──────────┬─────────┐
│ Agent   │ Tasks    │ Success % │ Avg Time │ Cost   │
├─────────┼──────────┼───────────┼──────────┼─────────┤
│ Nova    │ 12       │ 92%       │ 2.3h     │ $52.30 │
│ Echo    │ 8        │ 100%      │ 1.8h     │ $34.10 │
│ Pixel   │ 6        │ 83%       │ 3.1h     │ $28.50 │
│ Scout   │ 15       │ 100%      │ 1.2h     │ $31.20 │
└─────────┴──────────┴───────────┴──────────┴─────────┘

Task Velocity:

Tasks Completed Per Day

Mon: ████████ 8
Tue: ██████████ 10
Wed: ██████ 6
Thu: ████████████ 12
Fri: ████████ 8

Trend: +15% vs last week

Cost Trends:

Daily Spending (Last 30 Days)

[Line graph showing spend over time]

Peak: $18.50 on Dec 10
Average: $8.20/day
Projected Monthly: $246

Company Performance

Company-Level Metrics:

Total tasks completed
Goals achieved
Budget utilization
Team efficiency
Time to completion

Example:

Q1 Marketing Campaign Performance

Tasks: 45 completed | 3 in progress | 2 blocked
Goals: 2 of 3 achieved | 1 in progress
Budget: $342 of $1,000 (34%)
Team Efficiency: 94%
Avg Task Time: 2.1 days

Optimizing Performance

Speed Optimization

If Agents Are Slow:

Check Model Choice
- Faster models for simple tasks
- Use GPT-3.5 instead of GPT-4
- Local Ollama for speed
Simplify Tasks
- Break large tasks into smaller ones
- Provide clear requirements
- Reduce scope creep
Reduce Context
- Clear old memories
- Archive completed projects
- Focus on relevant info
Parallel Processing
- Spawn subagents
- Hire more specialists
- Distribute workload

Cost Optimization

If Costs Are High:

Switch Models
- Use cheaper models for routine work
- Reserve expensive models for complex tasks
- Mix providers
Optimize Prompts
- Shorter, clearer instructions
- Use examples
- Be specific
Batch Work
- Group similar tasks
- Process in batches
- Reduce API calls
Archive When Done
- Don't keep idle agents running
- Archive completed companies
- Release unused resources

Cost Comparison:

Model	Quality	Speed	Cost
Claude 3.5 Sonnet	High	Fast	$$$
GPT-4	High	Medium	$$$
GPT-3.5	Medium	Fast	$
Ollama Local	Varies	Fast	Free (hardware)

Quality Optimization

If Quality Is Low:

Clear Acceptance Criteria
- Define "done" specifically
- Provide examples
- Set standards
Better Briefings
- More context
- Background information
- Style guides
Review Process
- Add reviewer agents
- Human review step
- Iterate on feedback
Right Agent for Job
- Match skills to task
- Use specialists
- Don't use Interns for complex work

Performance Reviews

Regular Check-ins

Weekly Review:

Week of Dec 9-15, 2024

Top Performers:
1. Echo — 8 tasks, 100% success, under budget
2. Scout — 15 tasks, fast turnaround
3. Nova — Complex features delivered

Needs Attention:
1. Pixel — 3 tasks blocked, need design assets

Cost Check:
• Budget: $200/week
• Actual: $187/week ✅

Next Week:
• Focus: Complete Q1 campaign
• Watch: Pixel's design backlog

Monthly Report:

December 2024 Performance

Overall:
• 156 tasks completed (+23% vs Nov)
• 97% success rate (+2% vs Nov)
• $892 total cost (+15% vs Nov)

By Agent:
[Full breakdown table]

By Company:
[Company performance summary]

Recommendations:
1. Hire additional designer (backlog growing)
2. Switch Pixel to GPT-3.5 for routine work
3. Archive old projects to free memory

Identifying Issues

Red Flags:

Sign	Possible Issue	Solution
High error rate	Poor task fit	Reassign to different agent
Slow completion	Task too large	Break into smaller tasks
High cost	Wrong model	Switch to cheaper model
Low quality	Unclear requirements	Add acceptance criteria
Blocked often	Dependencies	Fix workflow

Benchmarks

Good Performance

Individual Agent:

90%+ success rate
2-4 hours per task (average)
Under budget
Positive feedback

Company:

85%+ goals achieved
On-time delivery
Within budget
Quality deliverables

Poor Performance

Individual Agent:

Less than 70% success rate
Consistently over deadline
High error rate
Repeated quality issues

Company:

Missed goals
Budget overruns
Stalled tasks
Poor deliverables

Improving Performance

Agent-Level Improvements

Training:

Update SOUL.md with lessons learned
Provide feedback on completed work
Share best practices
Document preferences

Tools:

Enable/disable specific tools
Adjust permissions
Update capabilities
Add/remove skills

Configuration:

Change AI model
Adjust timeouts
Set workspace restrictions
Configure notifications

Company-Level Improvements

Process:

Refine workflows
Add/remove review steps
Adjust approval thresholds
Improve handoffs

Team:

Hire complementary skills
Remove underperformers
Rebalance workload
Add capacity

Planning:

Better goal setting
Realistic timelines
Clearer requirements
More specific deliverables

Performance Tracking Tools

Built-in Reports

Standup Reports:

Daily activity summary
Completion rates
Blockers identified
Team velocity

Cost Reports:

Spending breakdown
Budget vs actual
Cost per deliverable
Forecasting

Activity Reports:

Tool usage
Time online
Communication volume
Output metrics

Custom Tracking

Create Dashboard:

Select metrics to track
Choose time period
Set benchmarks
Export reports

Set Alerts:

Alert: Agent success rate less than 80%
Action: Notify admin

Alert: Daily cost greater than $20
Action: Warn about budget

Alert: Task blocked more than 3 days
Action: Escalate to manager

Case Studies

Case 1: Optimizing a Slow Agent

Problem: Nova (Engineer) taking 6+ hours per task

Investigation:

Using GPT-4 for all tasks
Large memory slowing responses
Tasks too broad

Solution:

Switched to Claude 3.5 Sonnet (faster)
Archived old project memories
Broke tasks into smaller pieces

Result: 6 hours → 2.5 hours avg, same quality

Case 2: Reducing Costs

Problem: Monthly costs 50% over budget

Investigation:

All agents using expensive models
Idle agents running
Inefficient workflows

Solution:

Routine work → GPT-3.5
Archived completed companies
Added cost-conscious guidelines

Result: $450/month → $180/month

Case 3: Improving Quality

Problem: 30% of deliverables need rework

Investigation:

Unclear requirements
No acceptance criteria
Missing style guides

Solution:

Added specific acceptance criteria
Created style guide in knowledge base
Added reviewer step

Result: 30% rework → 5% rework

Best Practices

Monitoring

Weekly Reviews — Regular check-ins
Track Trends — Not just snapshots
Compare Periods — Month-over-month
Set Baselines — Know what's normal
Investigate Anomalies — Dig into issues

Optimization

Start Conservative — Then optimize
Measure Changes — A/B test adjustments
Focus on Bottlenecks — Biggest impact first
Balance Quality/Speed/Cost — Can't optimize all three
Iterate — Continuous improvement

Communication

Share Results — With team/stakeholders
Celebrate Wins — Recognize good performance
Address Issues — Quickly and directly
Set Expectations — Clear goals
Be Data-Driven — Decisions based on metrics

Agent Performance

Performance Metrics

Individual Agent Metrics

Viewing Agent Performance

Performance Analytics

Analytics Panel

Company Performance

Optimizing Performance

Speed Optimization

Cost Optimization

Quality Optimization

Performance Reviews

Regular Check-ins

Identifying Issues

Benchmarks

Good Performance

Poor Performance

Improving Performance

Agent-Level Improvements

Company-Level Improvements

Performance Tracking Tools

Built-in Reports

Custom Tracking

Case Studies

Case 1: Optimizing a Slow Agent

Case 2: Reducing Costs

Case 3: Improving Quality

Best Practices

Monitoring

Optimization

Communication

Next Steps