Nilesh Raut

Posted on May 21 • Edited on May 25

Hot To Run LLMs Locally

#llm #ai #tutorial #learning

If you are using Claude API, OpenAI API, Cursor, or AI coding tools daily, your API bill can grow very fast.

A lot of developers are now moving to local LLM setups because they want:

Lower AI costs
Offline AI access
Better privacy
Faster experimentation
No API limits

The good news is:

You can now run powerful AI models directly on your laptop using tools like Ollama (run llm locally).

This setup works great for:

Coding help
Refactoring
Learning
Documentation
AI chat
Small local agents

Let’s set it up step by step.

Step 1: Install Ollama

Download Ollama

Install it normally like any software.

After installation, open CMD or Terminal and check:

ollama --version

If you see a version number, it is installed correctly.

Step 2: Download Your First AI Model

Now pull a model locally.

Example:

ollama pull llama3

Or for coding:

ollama pull qwen2.5-coder:7b

The first download may take a few minutes because models are several GB in size.

Step 3: Run the Model

Start chatting with the model:

ollama run llama3

Example:

>>> Explain Docker in simple words

You now have a local AI assistant running directly on your machine.

No API required.

Step 4: Use It Inside VS Code

Install:

Continue.dev
Cline

Both work with Ollama locally.

In Continue.dev config:

{
  "models": [
    {
      "title": "Local AI",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b"
    }
  ]
}

Now VS Code can use your local model for:

Code generation
Refactoring
Debugging
Chat

Step 5: Open Chat UI in Browser

You can also use a ChatGPT-like interface locally.

Install Open WebUI using Docker:

docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main

Open:

http://localhost:3000

Now you have your own private AI chat app.

Recommended Models

Model	Best For
Qwen2.5 Coder	Coding
DeepSeek Coder	Refactoring
Llama 3	General AI
Phi	Low-end laptops
Mistral	Fast responses

Minimum Hardware

Basic setup:

16GB RAM recommended
SSD storage
NVIDIA GPU helps a lot

CPU-only works too, but slower.

Why Developers Like Local AI

Main reasons:

No monthly API bills
More privacy
Works offline
Full control
Easy experimentation

For daily coding workflows, local LLMs are becoming surprisingly useful.

Cloud models are still stronger for advanced reasoning, but local AI is now good enough for many real-world tasks.

Final Thoughts

If you are spending too much on AI APIs, this is probably the easiest way to reduce costs.

Start simple:

Install Ollama
Pull one coding model
Connect it to VS Code

That alone can replace a large percentage of your daily AI usage.

Useful links:

DEV Community