← Back to Research

Overview

Priority lanes, Quality of Service (QoS) guarantees, and SLA signaling for message delivery.

Problem Statement

Current SMTP queues are FIFO: - All messages treated equally (newsletter = urgent alert) - No SLA guarantees - No priority routing - Batch delivery not optimized

Goals

  1. Multi-lane queuing (urgent, normal, bulk)
  2. SLA-based routing with guarantees
  3. Smart batching for bulk mail
  4. Recipient-based optimization
  5. Cost-aware routing (prefer cheaper routes for low-priority)

Architecture

                    ┌─────────────┐
Incoming Messages───┤ Classifier  ├───┐
                    └─────────────┘   │
                                      │
                    ┌─────────────────┴─────────────────┐
                    │                                   │
             ┌──────▼──────┐   ┌───────▼──────┐  ┌────▼─────┐
             │ Priority    │   │ Normal       │  │ Bulk     │
             │ Queue       │   │ Queue        │  │ Queue    │
             │ SLA: 5min   │   │ SLA: 1hr     │  │ SLA: 24hr│
             │ Max: 1000   │   │ Max: 10000   │  │ Max: ∞   │
             └──────┬──────┘   └───────┬──────┘  └────┬─────┘
                    │                  │               │
                    └──────────────────┴───────────────┘
                                       │
                              ┌────────▼────────┐
                              │ Smart Scheduler │
                              │ - Load balancing│
                              │ - Cost optimize │
                              │ - Batching      │
                              └────────┬────────┘
                                       │
                              ┌────────▼────────┐
                              │  Delivery       │
                              └─────────────────┘

Classification Rules

1. Priority Headers

X-Message-Priority: urgent|normal|bulk
X-SLA-Deadline: 2026-03-07T20:05:00Z
X-Cost-Preference: speed|balanced|economy

2. Automatic Classification

def classify_message(msg):
    # Sender reputation
    if is_verified_sender(msg):
        priority += 10

    # Content analysis
    if '2FA code' in msg.subject:
        return 'urgent'
    if 'password reset' in msg.subject:
        return 'urgent'
    if 'newsletter' in msg.headers['List-Unsubscribe']:
        return 'bulk'

    # Recipient preference
    if recipient_has_sla(msg.to):
        return 'priority'

    # Rich metadata integration
    if msg.has_rich_metadata():
        return msg.metadata.priority.level

    return 'normal'

Queue Management

Priority Queue

  • Max age: 5 minutes
  • Max size: 1,000 messages
  • Retry: Immediate (every 30s)
  • Alert: If delayed >2 min
  • Cost: Premium routing

Normal Queue

  • Max age: 1 hour
  • Max size: 10,000 messages
  • Retry: Exponential backoff
  • Alert: If delayed >30 min

Bulk Queue

  • Max age: 24 hours
  • Max size: Unlimited
  • Retry: Slow (every 15 min)
  • Batching: Group by recipient domain
  • Cost: Economy routing

Smart Scheduling

Load Balancing

def select_relay(message, available_relays):
    # Factor in:
    # - Current queue depth
    # - Relay performance (latency, success rate)
    # - Cost (premium vs standard)
    # - Recipient's preferred relay
    # - Geographic proximity

    if message.priority == 'urgent':
        return select_fastest_relay(available_relays)
    elif message.priority == 'bulk':
        return select_cheapest_relay(available_relays)
    else:
        return select_balanced_relay(available_relays)

Batch Optimization

def optimize_bulk_delivery():
    # Group by recipient domain
    batches = group_by_domain(bulk_queue)

    for domain, messages in batches:
        # Wait for minimum batch size OR max wait time
        if len(messages) >= 100 or oldest_message_age > 1hr:
            # Deliver batch in parallel streams
            deliver_batch(messages, parallel_streams=5)

SLA Monitoring

Metrics

CREATE TABLE queue_sla_metrics (
    id SERIAL PRIMARY KEY,
    queue_type VARCHAR(20),
    message_count INTEGER,
    avg_queue_time_ms INTEGER,
    p50_queue_time_ms INTEGER,
    p95_queue_time_ms INTEGER,
    p99_queue_time_ms INTEGER,
    sla_violations INTEGER,
    measured_at TIMESTAMP DEFAULT NOW()
);

Alerts

# Alert if SLA violated
if priority_queue.p95_time > 300000:  # 5 min
    alert('Priority queue SLA violation: p95 > 5min')

if normal_queue.p95_time > 3600000:  # 1 hr
    alert('Normal queue SLA violation: p95 > 1hr')

Integration with msgs.global

Postfix Configuration

# /etc/postfix/main.cf
# Multi-instance queues
multi_instance_enable = yes
multi_instance_directories = /var/spool/postfix-priority,
                              /var/spool/postfix-normal,
                              /var/spool/postfix-bulk

# Custom transport
transport_maps = hash:/etc/postfix/transport

Transport Maps

# Priority users/domains
urgent@msgs.global    smtp-priority:
*.priority.msgs.global smtp-priority:

# Bulk senders
newsletter@msgs.global smtp-bulk:
bulk.msgs.global       smtp-bulk:

# Default
*                      smtp-normal:

Flask API

@app.route('/api/v1/queue/status')
def queue_status():
    return {
        'priority': {
            'depth': get_queue_depth('priority'),
            'avg_age_ms': get_avg_age('priority'),
            'sla_compliance': get_sla_compliance('priority')
        },
        'normal': {...},
        'bulk': {...}
    }

@app.route('/api/v1/queue/classify', methods=['POST'])
def classify_message_api():
    msg_data = request.json
    classification = classify_message(msg_data)
    return {'queue': classification, 'sla': get_sla(classification)}

Cost Model

Queue Route Cost/Message SLA
Priority Premium relay $0.02 5 min
Normal Standard relay $0.005 1 hr
Bulk Economy relay $0.001 24 hr

Status

📋 Research Phase

Next Steps

  1. Design multi-instance Postfix setup
  2. Implement classification engine
  3. Build SLA monitoring dashboard
  4. Test batch optimization algorithms
  5. Define cost model and routing rules