DEV Community

Cover image for Hot To Run LLMs Locally
Nilesh Raut
Nilesh Raut

Posted on • Edited on

Hot To Run LLMs Locally

If you are using Claude API, OpenAI API, Cursor, or AI coding tools daily, your API bill can grow very fast.

A lot of developers are now moving to local LLM setups because they want:

  • Lower AI costs
  • Offline AI access
  • Better privacy
  • Faster experimentation
  • No API limits

The good news is:

You can now run powerful AI models directly on your laptop using tools like Ollama (run llm locally).

This setup works great for:

  • Coding help
  • Refactoring
  • Learning
  • Documentation
  • AI chat
  • Small local agents

Let’s set it up step by step.


Step 1: Install Ollama

Download Ollama

Install it normally like any software.

After installation, open CMD or Terminal and check:

ollama --version
Enter fullscreen mode Exit fullscreen mode

If you see a version number, it is installed correctly.


Step 2: Download Your First AI Model

Now pull a model locally.

Example:

ollama pull llama3
Enter fullscreen mode Exit fullscreen mode

Or for coding:

ollama pull qwen2.5-coder:7b
Enter fullscreen mode Exit fullscreen mode

The first download may take a few minutes because models are several GB in size.


Step 3: Run the Model

Start chatting with the model:

ollama run llama3
Enter fullscreen mode Exit fullscreen mode

Example:

>>> Explain Docker in simple words
Enter fullscreen mode Exit fullscreen mode

You now have a local AI assistant running directly on your machine.

No API required.


Step 4: Use It Inside VS Code

Install:

  • Continue.dev
  • Cline

Both work with Ollama locally.

In Continue.dev config:

{
  "models": [
    {
      "title": "Local AI",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Now VS Code can use your local model for:

  • Code generation
  • Refactoring
  • Debugging
  • Chat

Step 5: Open Chat UI in Browser

You can also use a ChatGPT-like interface locally.

Install Open WebUI using Docker:

docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

Open:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Now you have your own private AI chat app.


Recommended Models

Model Best For
Qwen2.5 Coder Coding
DeepSeek Coder Refactoring
Llama 3 General AI
Phi Low-end laptops
Mistral Fast responses

Minimum Hardware

Basic setup:

  • 16GB RAM recommended
  • SSD storage
  • NVIDIA GPU helps a lot

CPU-only works too, but slower.


Why Developers Like Local AI

Main reasons:

  • No monthly API bills
  • More privacy
  • Works offline
  • Full control
  • Easy experimentation

For daily coding workflows, local LLMs are becoming surprisingly useful.

Cloud models are still stronger for advanced reasoning, but local AI is now good enough for many real-world tasks.


Final Thoughts

If you are spending too much on AI APIs, this is probably the easiest way to reduce costs.

Start simple:

  • Install Ollama
  • Pull one coding model
  • Connect it to VS Code

That alone can replace a large percentage of your daily AI usage.

Useful links:

Top comments (0)