Observability

Strategy

Start with lightweight, self-hosted observability using open-source tooling. The observability layer is modular — logging, metrics, tracing, and client error reporting are isolated behind interfaces so the underlying tools can be swapped without affecting application code.

Server-Side

Stack

Concern	Tool	Role
Structured logging	`tracing` (Rust crate)	Structured, leveled logs emitted from server code
Distributed tracing	`tracing` + OpenTelemetry	Request-scoped trace spans across async operations
Metrics	Prometheus	Counter, gauge, and histogram metrics scraped from the server
Log aggregation	Loki	Collects and indexes structured logs
Dashboards	Grafana	Visualizes logs (Loki), metrics (Prometheus), and traces

Why This Stack

tracing is the Rust ecosystem standard — Axum, Tonic, and Dioxus all integrate with it natively. One instrumentation library covers logging, spans, and metrics export.
OpenTelemetry is the vendor-neutral standard for telemetry. Exporting via OTLP means switching backends (e.g., from Loki to Datadog, or from self-hosted to managed) requires only configuration changes.
Grafana + Loki + Prometheus are self-hosted, free, and run as Docker containers alongside the server — fits the existing containerized deployment model.

Server Crate Structure

Observability is an isolated module in the server crate:

apps/server/
  src/
    auth/
    billing/
    observability/        # tracing setup, metrics registration, OTLP export
    handlers/
    ...

Application code uses tracing macros (info!, warn!, error!, #[instrument]) and never references the observability backend directly. Swapping Loki for a managed service means changing configuration in observability/, not application code.

What to Instrument

Request tracing:

Every ConnectRPC call gets a trace span (Tonic middleware handles this automatically)
Spans include: method name, user_id, org_id, duration, status code

Key operations:

Auth flows (login, token refresh, invite upgrade)
Score submission (including dedup checks)
Sync operations (queue flush, read-down)
Billing webhook processing
Database query duration

Metrics:

Request rate and latency per RPC method
Error rate by error code
Active connections / concurrent streams
Score submission volume
Queue flush success/failure rate
Database connection pool utilization

Structured Log Format

All logs are structured JSON via tracing:

{
  "timestamp": "2026-05-15T14:30:00Z",
  "level": "info",
  "message": "score created",
  "span": {"method": "ScoreService/CreateScore", "trace_id": "abc123"},
  "fields": {"event_id": "018f...", "shooter_id": "018f...", "org_id": "018f..."}
}

No unstructured string logs. Every log entry is queryable in Loki.

Client-Side

Mobile Apps (iOS + Android)

Native crash and error reporting, plus custom event tracking:

Concern	iOS	Android
Crash reporting	Native crash logs (PLCrashReporter or similar)	Native crash logs (uncaught exception handler)
Error events	Reported to server via a lightweight reporting RPC	Same
Custom events	Sync failures, queue flush outcomes, offline duration	Same

Client error reports are sent to the server via a dedicated ConnectRPC service:

service TelemetryService {
  rpc ReportError(ErrorReport) returns (ReportResponse);
  rpc ReportEvent(EventReport) returns (ReportResponse);
}

This keeps client telemetry in the same infrastructure — no separate third-party SDK required at MVP. If a dedicated service like Sentry is adopted later, the client-side reporting interface stays the same; only the server-side handler changes.

PWA

Browser-level error capture via window.onerror and unhandledrejection. Reported through the same TelemetryService RPC when online. Limited compared to native — no background crash reporting.

What to Report from Clients

Crashes — stack trace, device info, app version
Sync failures — RPC method, error code, retry count, offline duration
Queue state — queue depth at flush, flush success/failure
Performance — app startup time, time-to-interactive

Deployment

Observability services run as containers alongside the application:

# Added to docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
 
  loki:
    image: grafana/loki:latest
 
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"

For white-label deployments, observability containers are optional — the customer can include them or point to their own monitoring infrastructure.

Modularity

The design is intentionally layered to allow swapping components:

Layer	Current	Could Swap To
Instrumentation	`tracing` + OpenTelemetry	No reason to swap — this is the standard
Log backend	Loki	Datadog, Elasticsearch, CloudWatch
Metrics backend	Prometheus	Datadog, CloudWatch, managed Prometheus
Trace backend	Grafana (via OTLP)	Jaeger, Datadog, Honeycomb
Client reporting	Custom `TelemetryService` RPC	Sentry SDK, Crashlytics
Dashboards	Grafana	Datadog dashboards, managed Grafana

Swapping any backend is a configuration change in the observability/ module or docker-compose.yml. Application code is unaffected.

Range Day Pro

Explorer

Observability

Observability

Strategy

Server-Side

Stack

Why This Stack

Server Crate Structure

What to Instrument

Structured Log Format

Client-Side

Mobile Apps (iOS + Android)

PWA

What to Report from Clients

Deployment

Modularity

Graph View

Table of Contents

Backlinks

Range Day Pro

Explorer

Observability

Observability

Strategy

Server-Side

Stack

Why This Stack

Server Crate Structure

What to Instrument

Structured Log Format

Client-Side

Mobile Apps (iOS + Android)

PWA

What to Report from Clients

Deployment

Modularity

Related

Graph View

Table of Contents

Backlinks