OpenJet

Offline terminal-based agent for self-hosted LLMs. Supports Unified memory systems (Nvidia Jetson), Cuda, ROCm, and Vulkan backends.

Open-Jet terminal UI showing local tool execution
Open-Jet terminal UI showing local tool execution

Impact

  • Decode speed of 39 tok/s for Qwen3.5-27B on RTX 3090.
  • Automated local model setup via llama.cpp with hardware profiling for optimised configurations
  • Memory management to avoid OOM errors: Automatic model offloading, context size calculations, automatic context compression
  • Designed with edge devices and IoT in mind: user can register IO hardware, OpenJet logs the output, and the local LLM reads the log and runs appropiate tools.
  • Supports agent tool execution, local context management, and approval gates for state-changing actions
  • Python SDK exposes hardware profiling, background agent orchestration, and model tok/s benchmark parameter sweeps

Tools

Key decisions

  • Most agentic coding systems still depend on cloud connectivity for orchestration or inference, which breaks in offline and restricted environments.
  • Running an OS-level LLM agent directly on self-owned edge hardware enables local device control without sending code, logs, or shell state to external services.
  • Security posture improves when execution, model weights, and tool outputs stay local, with explicit approval gates for mutating actions.

Technical approach

  • A setup wizard profiles target hardware, tunes GPU layer offload, and recommends model sizing.
  • Inference runs through local llama-server using quantized GGUF models from local files or Ollama pulls.
  • The Textual TUI orchestrates chat, slash commands, file mentions, and tool execution requests.
  • Session events and resource telemetry are written as structured logs for observability and replay.

Project links