image.png

Goal

We’re building Luna, an affordable & intelligent local AI assistant, think ChatGPT or Claude, runs locally on a sub-200$ device.

With the inherent limitations of the hardware, only Small Language Models can be run, specifically under 3 Billion parameters

In this mini-research, we set out to answer:

Experimental Setup

  1. Agent framework: We pick https://github.com/huggingface/smolagents, allowing a model to generate code to solve problems
  2. Models: We test a variety of Language Models
    1. Open-source models: anything from Smollm2:135M to Phi4
    2. Closed-source models: o3-mini & gpt-4.1
  3. Compute: The LLM inference is done on a variety of services & hardware to optimize cost & speed
    1. My personal NVIDIA 1050 Ti laptop (free)
    2. Microsoft Azure VM (8vCPU + 32GB) (free with Startup Credits)
    3. Vast AI (10$ spent on NVIDIA 4090 Ti)
    4. Ori.co (10$ spent on NVIDIA A16)
    5. Microsoft Azure AI Foundary with GPT 4.1 & o3-mini (free with Startup Credits)
    6. Cloudflare AI Workers (free with Startup Credits)
  4. Evaluation: OpenAI o3-mini serving as LLM-as-a-judge

Tasks

We prepared a set of 183 tasks across 8 categories:

  1. Mathematics & Quantitative Reasoning
  2. Science & Technical Knowledge
  3. Language & Communication