Skip to main content
🔮 Dynamic 2026 Model Registry

Local AI & LLM RAM Sizing Hub

Planning to run LLMs locally? Explore our real-time database of **95 open-weights models** dynamically synced from global registries. Select your model and target quantization to calculate the exact weight footprints, KV cache requirements, and the perfect dual-channel RAM kit setup.

Tracked Models

95

Quantizations

4-bit, 8-bit, FP16

Key Providers

Meta, DeepSeek, Google, Qwen

Showing 95 of 95 open-weights models

Moonshot KimiMoE

Kimi K2.5

Total Params:15000B
Active Params:1500B
Context Window:262K tokens
Release:2025/2026

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

View Memory Requirements
DeepSeekMoE

DeepSeek-V4-Pro (1.6T MoE)

Total Params:1600B
Active Params:49B
Context Window:1M tokens
Release:April 2026

Flagship open reasoning model featuring a 1.6 Trillion parameter Mixture-of-Experts (MoE) architecture with 49 Billion active parameters per token. Utilizes Compressed Sparse Attention (CSA) for extreme long-context memory efficiency.

View Memory Requirements
Alibaba QwenMoE

Qwen3.6 Max Preview

Total Params:1000B
Active Params:100B
Context Window:262K tokens
Release:2025/2026

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...

View Memory Requirements
Moonshot KimiMoE

Kimi K2 0905

Total Params:1000B
Active Params:100B
Context Window:262K tokens
Release:2025/2026

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

View Memory Requirements
Moonshot KimiMoE

Kimi K2.6 (1T MoE)

Total Params:1000B
Active Params:32B
Context Window:262K tokens
Release:April 2026

SW-bench leading agentic coding and planning flagship from Moonshot AI. Specifically optimized for autonomous engineering, complex tool-use, and long-horizon tasks.

View Memory Requirements
Zhipu GLMMoE

GLM-5 (744B MoE)

Total Params:744B
Active Params:40B
Context Window:256K tokens
Release:February 2026

Zhipu AI's cutting-edge open-weights flagship. Delivers exceptional general reasoning, systems engineering, and multi-turn planning under a permissive MIT license.

View Memory Requirements
DeepSeekDense

R1 0528

Total Params:671B
Active Params:Dense
Context Window:164K tokens
Release:2025/2026

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

View Memory Requirements
DeepSeekDense

R1

Total Params:671B
Active Params:Dense
Context Window:164K tokens
Release:2025/2026

DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass....

View Memory Requirements
Alibaba QwenDense

Qwen3 Coder Plus

Total Params:480B
Active Params:Dense
Context Window:1M tokens
Release:2025/2026

Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and...

View Memory Requirements
Alibaba QwenMoE

Qwen3 Coder 480B A35B (free)

Total Params:480B
Active Params:60B
Context Window:1M tokens
Release:2025/2026

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

View Memory Requirements
Nous ResearchDense

Hermes 4 405B

Total Params:405B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...

View Memory Requirements
Nous ResearchDense

Hermes 3 405B Instruct (free)

Total Params:405B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

View Memory Requirements
Meta LlamaMoE

Llama 4 Maverick (400B MoE)

Total Params:400B
Active Params:17B
Context Window:1M tokens
Release:April 2025

Meta's premier large-scale open-weight Mixture-of-Experts flagship. Runs highly advanced logic, multimodal calculations, and deep code generation.

View Memory Requirements
Alibaba QwenMoE

Qwen3.5 397B A17B

Total Params:397B
Active Params:49.6B
Context Window:262K tokens
Release:2025/2026

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

View Memory Requirements
DeepSeekMoE

DeepSeek-V4-Flash (284B MoE)

Total Params:284B
Active Params:13B
Context Window:1M tokens
Release:April 2026

High-speed, high-efficiency reasoning variant of the DeepSeek-V4 family. Extremely responsive edge MoE requiring low latency and optimized memory footprints.

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 235B A22B Thinking

Total Params:235B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 235B A22B Instruct

Total Params:235B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

View Memory Requirements
Alibaba QwenMoE

Qwen3 235B A22B Thinking 2507

Total Params:235B
Active Params:29.4B
Context Window:262K tokens
Release:2025/2026

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...

View Memory Requirements
Alibaba QwenMoE

Qwen3 235B A22B Instruct 2507

Total Params:235B
Active Params:29.4B
Context Window:262K tokens
Release:2025/2026

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...

View Memory Requirements
Alibaba QwenMoE

Qwen3 235B A22B

Total Params:235B
Active Params:29.4B
Context Window:131K tokens
Release:2025/2026

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and...

View Memory Requirements
Mistral AIMoE

Mixtral 8x22B Instruct

Total Params:176B
Active Params:44B
Context Window:66K tokens
Release:2025/2026

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

View Memory Requirements
MicrosoftMoE

WizardLM-2 8x22B

Total Params:176B
Active Params:44B
Context Window:66K tokens
Release:2025/2026

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is...

View Memory Requirements
Mistral AIDense

Mistral Medium 3.5

Total Params:128B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

View Memory Requirements
Mistral AIDense

Pixtral Large 2411

Total Params:124B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...

View Memory Requirements
Mistral AIDense

Devstral 2 2512

Total Params:123B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...

View Memory Requirements
Alibaba QwenMoE

Qwen3.5-122B-A10B

Total Params:122B
Active Params:15.3B
Context Window:262K tokens
Release:2025/2026

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

View Memory Requirements
Mistral AIMoE

Mistral Small 4 (119B MoE)

Total Params:119B
Active Params:6.5B
Context Window:256K tokens
Release:March 2026

Mistral AI's premier production-grade MoE model. Unifies instruction following, image/text inputs, and multi-step agentic workflows with low memory footprint.

View Memory Requirements
CohereDense

Command A

Total Params:111B
Active Params:Dense
Context Window:256K tokens
Release:2025/2026

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...

View Memory Requirements
Meta LlamaMoE

Llama 4 Scout (109B MoE)

Total Params:109B
Active Params:17B
Context Window:10M tokens
Release:April 2025

Meta's long-context open-weights champion featuring a native 10 Million token window. Alternates dense and MoE layers to fit on prosumer developer machines with quantization.

View Memory Requirements
Alibaba QwenMoE

Qwen3 Coder Next

Total Params:80B
Active Params:10B
Context Window:262K tokens
Release:2025/2026

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...

View Memory Requirements
Alibaba QwenDense

Qwen3 Next 80B A3B Thinking

Total Params:80B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...

View Memory Requirements
Alibaba QwenDense

Qwen3 Next 80B A3B Instruct (free)

Total Params:80B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

View Memory Requirements
Alibaba QwenDense

Qwen2.5 VL 72B Instruct

Total Params:72B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

View Memory Requirements
Alibaba QwenDense

Qwen2.5 72B Instruct

Total Params:72B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...

View Memory Requirements
Nous ResearchDense

Hermes 4 70B

Total Params:70B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

View Memory Requirements
DeepSeekDense

R1 Distill Llama 70B

Total Params:70B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across...

View Memory Requirements
Meta LlamaDense

Llama 3.3 70B Instruct (free)

Total Params:70B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

View Memory Requirements
Nous ResearchDense

Hermes 3 70B Instruct

Total Params:70B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

View Memory Requirements
Meta LlamaDense

Llama 3.1 70B Instruct

Total Params:70B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...

View Memory Requirements
Meta LlamaDense

Llama 3 70B Instruct

Total Params:70B
Active Params:Dense
Context Window:8K tokens
Release:2025/2026

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...

View Memory Requirements
Alibaba QwenMoE

Qwen3.6 35B A3B

Total Params:35B
Active Params:4.4B
Context Window:262K tokens
Release:2025/2026

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...

View Memory Requirements
Alibaba QwenMoE

Qwen3.5-35B-A3B

Total Params:35B
Active Params:4.4B
Context Window:262K tokens
Release:2025/2026

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

View Memory Requirements
Alibaba QwenMoE

Qwen 3.6 35B-A3B (MoE)

Total Params:35B
Active Params:3B
Context Window:128K tokens
Release:April 2026

Sparse MoE developer favorite activating only 3 Billion parameters per token. Exceptional coding throughput and systems integration efficiency.

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 32B Instruct

Total Params:32B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

View Memory Requirements
Moonshot KimiMoE

Kimi K2 0711

Total Params:32B
Active Params:4B
Context Window:131K tokens
Release:2025/2026

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for...

View Memory Requirements
Alibaba QwenDense

Qwen3 32B

Total Params:32B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

View Memory Requirements
DeepSeekDense

R1 Distill Qwen 32B

Total Params:32B
Active Params:Dense
Context Window:128K tokens
Release:2025/2026

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new...

View Memory Requirements
Alibaba QwenDense

Qwen2.5 Coder 32B Instruct

Total Params:32B
Active Params:Dense
Context Window:128K tokens
Release:2025/2026

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...

View Memory Requirements
Google GemmaDense

Gemma 4 31B (free)

Total Params:31B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

View Memory Requirements
Google GemmaDense

Gemma 4 31B (Dense)

Total Params:31B
Active Params:Dense
Context Window:131K tokens
Release:April 2026

Google DeepMind's absolute premier single-GPU consumer flagship. Delivers outstanding performance-to-size ratios, making it the prosumer developer standard.

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 30B A3B Thinking

Total Params:30B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 30B A3B Instruct

Total Params:30B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

View Memory Requirements
Alibaba QwenMoE

Qwen3 30B A3B Thinking 2507

Total Params:30B
Active Params:3.8B
Context Window:131K tokens
Release:2025/2026

Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated...

View Memory Requirements
Alibaba QwenMoE

Qwen3 Coder 30B A3B Instruct

Total Params:30B
Active Params:3.8B
Context Window:160K tokens
Release:2025/2026

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...

View Memory Requirements
Alibaba QwenMoE

Qwen3 30B A3B Instruct 2507

Total Params:30B
Active Params:3.8B
Context Window:262K tokens
Release:2025/2026

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and...

View Memory Requirements
Alibaba QwenMoE

Qwen3 30B A3B

Total Params:30B
Active Params:3.8B
Context Window:131K tokens
Release:2025/2026

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...

View Memory Requirements
Alibaba QwenDense

Qwen3.6 27B

Total Params:27B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...

View Memory Requirements
Alibaba QwenDense

Qwen3.5-27B

Total Params:27B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

View Memory Requirements
Google GemmaDense

Gemma 3 27B

Total Params:27B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

View Memory Requirements
Google GemmaDense

Gemma 2 27B

Total Params:27B
Active Params:Dense
Context Window:8K tokens
Release:2025/2026

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...

View Memory Requirements
Alibaba QwenDense

Qwen 3.6 27B (Dense)

Total Params:27B
Active Params:Dense
Context Window:128K tokens
Release:April 2026

Alibaba's powerhouse dense developer flagship. Exceptionally strong in multi-lingual reasoning, structured outputs, and local workflow automation.

View Memory Requirements
Google GemmaMoE

Gemma 4 26B A4B (free)

Total Params:26B
Active Params:3.3B
Context Window:262K tokens
Release:2025/2026

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

View Memory Requirements
Google GemmaMoE

Gemma 4 26B (MoE)

Total Params:26B
Active Params:3.8B
Context Window:131K tokens
Release:April 2026

Ultra-efficient sparse MoE model from Google DeepMind, activating just 3.8 Billion parameters per token. Ideal for fast local inference and constrained hardware environments.

View Memory Requirements
Mistral AIDense

Voxtral Small 24B 2507

Total Params:24B
Active Params:Dense
Context Window:32K tokens
Release:2025/2026

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio...

View Memory Requirements
Mistral AIDense

Devstral Small 1.1

Total Params:24B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and...

View Memory Requirements
Cognitive ComputationsDense

Uncensored (free)

Total Params:24B
Active Params:Dense
Context Window:33K tokens
Release:2025/2026

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...

View Memory Requirements
Mistral AIDense

Mistral Small 3.2 24B

Total Params:24B
Active Params:Dense
Context Window:128K tokens
Release:2025/2026

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...

View Memory Requirements
Mistral AIDense

Mistral Small 3.1 24B

Total Params:24B
Active Params:Dense
Context Window:128K tokens
Release:2025/2026

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

View Memory Requirements
Mistral AIDense

Saba

Total Params:24B
Active Params:Dense
Context Window:33K tokens
Release:2025/2026

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional...

View Memory Requirements
Mistral AIDense

Mistral Small 3

Total Params:24B
Active Params:Dense
Context Window:33K tokens
Release:2025/2026

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...

View Memory Requirements
Mistral AIDense

Ministral 3 14B 2512

Total Params:14B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language...

View Memory Requirements
Alibaba QwenDense

Qwen3 14B

Total Params:14B
Active Params:Dense
Context Window:132K tokens
Release:2025/2026

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

View Memory Requirements
MicrosoftDense

Phi 4

Total Params:14B
Active Params:Dense
Context Window:16K tokens
Release:2025/2026

[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...

View Memory Requirements
Meta LlamaDense

Llama Guard 4 12B

Total Params:12B
Active Params:Dense
Context Window:164K tokens
Release:2025/2026

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

View Memory Requirements
Google GemmaDense

Gemma 3 12B

Total Params:12B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

View Memory Requirements
Mistral AIDense

Mistral Nemo

Total Params:12B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...

View Memory Requirements
Meta LlamaDense

Llama 3.2 11B Vision Instruct

Total Params:11B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

View Memory Requirements
Alibaba QwenDense

Qwen3.5-9B

Total Params:9B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

View Memory Requirements
Mistral AIDense

Ministral 3 8B 2512

Total Params:8B
Active Params:Dense
Context Window:262K tokens
Release:2025/2026

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 8B Thinking

Total Params:8B
Active Params:Dense
Context Window:256K tokens
Release:2025/2026

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...

View Memory Requirements
Alibaba QwenDense

Qwen3 VL 8B Instruct

Total Params:8B
Active Params:Dense
Context Window:256K tokens
Release:2025/2026

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

View Memory Requirements
Alibaba QwenDense

Qwen3 8B

Total Params:8B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...

View Memory Requirements
Meta LlamaDense

Llama Guard 3 8B

Total Params:8B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification)...

View Memory Requirements
Meta LlamaDense

Llama 3.1 8B Instruct

Total Params:8B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...

View Memory Requirements
Nous ResearchDense

Hermes 2 Pro - Llama-3 8B

Total Params:8B
Active Params:Dense
Context Window:8K tokens
Release:2025/2026

Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced...

View Memory Requirements
Meta LlamaDense

Llama 3 8B Instruct

Total Params:8B
Active Params:Dense
Context Window:8K tokens
Release:2025/2026

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong...

View Memory Requirements
CohereDense

Command R7B (12-2024)

Total Params:7B
Active Params:Dense
Context Window:128K tokens
Release:2025/2026

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning...

View Memory Requirements
Alibaba QwenDense

Qwen2.5 7B Instruct

Total Params:7B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...

View Memory Requirements
Mistral AIDense

Mistral 7B Instruct v0.1

Total Params:7B
Active Params:Dense
Context Window:4K tokens
Release:2025/2026

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.

View Memory Requirements
Google GemmaDense

Gemma 3n 4B

Total Params:4B
Active Params:Dense
Context Window:33K tokens
Release:2025/2026

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

View Memory Requirements
Google GemmaDense

Gemma 3 4B

Total Params:4B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

View Memory Requirements
MicrosoftDense

Phi-4-mini (3.8B)

Total Params:3.8B
Active Params:Dense
Context Window:128K tokens
Release:February 2025

Microsoft's lightweight reasoning marvel. Packed with high-density intelligence for fast local text processing and on-device execution.

View Memory Requirements
Mistral AIDense

Ministral 3 3B 2512

Total Params:3B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

View Memory Requirements
Meta LlamaDense

Llama 3.2 3B Instruct (free)

Total Params:3B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

View Memory Requirements
Meta LlamaDense

Llama 3.2 1B Instruct

Total Params:1B
Active Params:Dense
Context Window:131K tokens
Release:2025/2026

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...

View Memory Requirements

Understanding Local AI Memory Requirements

⚖️

Model Weights Size

The parameters of the model determine its baseline RAM/VRAM footprint. Every billion parameters requires 2GB in standard 16-bit float (`FP16`). Quantizing parameters to 4-bit (`Q4_K_M`) compresses this to ~0.56GB per billion, trading a tiny fraction of accuracy for a 72% memory savings.

Context Window KV Cache

As the context window scales, the key-value cache (KV cache) expands. Running massive 1M token windows (like DeepSeek V4) requires enormous RAM allocations strictly for context memory. Our calculators use model-specific parameters to compute this dynamically.

🚀

Memory Bandwidth Sizing

Because local LLM generation requires loading weights from RAM on every single step, generation speeds scale directly with memory bandwidth. Mainstream DDR5 configurations yield far superior tokens/s compared to DDR4, making memory frequency critical for local AI speed.

Frequently Asked Questions

How do you calculate RAM size for a local LLM?

The physical RAM requirement is the sum of three components: 1. **Model Weights Size**: calculated as `(Parameter Count * Bits per Weight) / 8` (MoE architectures require the total parameter weights loaded, even if only a subset are active per token). 2. **Context KV Cache Size**: determined by active parameters and target context length (models utilizing Grouped-Query Attention drastically reduce this footprint). 3. **OS Overhead**: typically 6GB to 12GB allocation depending on workstation or multi-GPU configurations.

Why do Mixture-of-Experts (MoE) models require so much RAM?

While Mixture-of-Experts models (like DeepSeek V4 or Llama 4 Maverick) only activate a small number of parameters per token during calculation (which keeps compute costs low), **the entire weights database of all experts must reside in physical RAM/VRAM** for fast expert switching during inference. Therefore, RAM sizing calculations must target the *total* parameters rather than just the *active* ones.

Can I run a 70B model on 32GB of RAM?

Not at full unquantized 16-bit weight precision (which requires ~140GB). However, you can run a 70B model at 4-bit quantization (e.g., `Q4_K_M`) which compresses weights down to ~39GB. When factoring in system overhead and context, a **64GB system memory kit** is required to run a 4-bit 70B model stably without out-of-memory crashes.

What is the speed bottleneck for running local LLMs on CPU/RAM?

The primary bottleneck is **system memory bandwidth**, not CPU cores. Large language models are highly memory-bandwidth bound. A typical dual-channel DDR5-6000 configuration achieves ~96 GB/s, generating tokens at around 2.4 tokens/s for a 40GB model. Upgrading to high-speed dual-channel CUDIMM (DDR5-8400 yields ~134 GB/s) or quad-channel workstation layouts is the best way to directly scale local inference speeds.