Local AI &amp; LLM RAM Sizing Hub

CORSAIR DOMINATOR PLATINUM RGB DDR5 RAM 64GB (2x32GB) 5600MHz CL40 Intel XMP iCUE Compatible Computer Memory - White (CMT64GX5M2B5600C40W)

$969.99$15.16/GB

96GB kits

All prices →

CORSAIR Vengeance RGB DDR5 RAM 96GB (2x48GB) 6000MHz CL30 Intel XMP iCUE Compatible Computer Memory - Black (CMH96GX5M2B6000C30)

$189.99$1.98/GB

A-Tech 96GB Kit (2x48GB) DDR5 5600MHz PC5-44800 CL46 SODIMM 2Rx8 Dual Rank 1.1V Non-ECC Unbuffered SO-DIMM 262-Pin Laptop Computer RAM Memory Upgrade Modules

$1530.87$15.95/GB

128GB kits

All prices →

A-Tech 128GB Kit (4x32GB) RAM for Apple iMac 2019 & 2020 27 inch Retina 5K | DDR4 2666 MHz SODIMM PC4-21300 / PC4-21333 260-Pin SO-DIMM Max Memory Upgrade

$825.02$6.45/GB

A-Tech 128GB Kit (4x32GB) DDR5 4800MHz PC5-38400 CL40 SODIMM 2Rx8 Dual Rank 1.1V Non-ECC Unbuffered SO-DIMM 262-Pin Laptop Computer RAM Memory Upgrade Modules

$1402.16$10.95/GB

DDR5 prices 32GB vs 64GB All capacity compares RAM Finder Kimi K3 sizer

▼

Showing 96 of 96 open-weights models

MoEWeights pending

Kimi K3 (2.8T MoE)

Total Params:2800B

Active Params:50B est.

Release:July 2026

Moonshot's open 2.8T-parameter frontier MoE with native vision and a 1M-token context. Official blog: activates 16 of 896 experts (Stable LatentMoE); API live now; full weights expected by July 27, 2026. Active params (~50B) estimated as 2800×16/896 until the technical report publishes an exact figure. RAM sizing uses total parameters for weight footprint.

DeepSeek-V4-Pro (1.6T MoE)

Total Params:1600B

Active Params:49B

Flagship open reasoning MoE: 1.6T total / 49B active with Compressed Sparse Attention for long-context efficiency. RAM estimates assume GGUF-style quantization; native FP4/FP8 footprints can differ.

Kimi K2 0905 (1T MoE)

Release:September 2025

Kimi K2 0905 checkpoint: 1T MoE with 32B active parameters and extended 256K context.

Kimi K2 (1T MoE)

Release:July 2025

Original Kimi K2 open-weight MoE: ~1T total parameters with 32B activated per token. Foundation for the later K2.5 / K2.6 / K2.7-Code family.

Kimi K2 Thinking (1T MoE)

Release:November 2025

Kimi K2 Thinking reasoning variant on the K2 MoE stack: 1T total / 32B active with long-horizon agentic reasoning.

Kimi K2.5 (1T MoE)

Release:January 2026

Moonshot multimodal MoE: 1T total / 32B active (384 experts, 8 selected + 1 shared), 256K context, MoonViT vision encoder. Open weights on Hugging Face.

Kimi K2.6 (1T MoE)

Release:March 2026

Kimi K2.6 agentic coding and planning flagship: 1T MoE with 32B active parameters and 256K context.

Kimi K2.7 Code (1T MoE)

Release:June 2026

Coding-focused Kimi K2.7 open-weight MoE (1T / 32B active, 256K context). Tuned for long-horizon software engineering with ~30% fewer thinking tokens vs K2.6.

Zhipu GLM

GLM-5.2 (744B MoE)

Total Params:744B

Active Params:40B

Release:June 2026

Z.ai / Zhipu flagship open-weight MoE (744B-A40B) with 1M context and IndexShare sparse attention. MIT license; optimized for long-horizon agentic coding.

Zhipu GLM

GLM-5 (744B MoE)

Total Params:744B

Active Params:40B

Release:February 2026

Zhipu AI open-weights MoE flagship (744B total / 40B active). Strong general reasoning and multi-turn planning under MIT license.

Context Window:164K tokens

DeepSeek V3 0324

Total Params:685B

Active Params:85.6B

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well...

DeepSeek V3.1

Total Params:671B

Context Window:164K tokens

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context...

DeepSeek-R1-0528 (671B MoE)

Total Params:671B

Active Params:37B

Release:May 2025

May 2025 DeepSeek-R1 refresh: same 671B MoE / 37B active architecture with updated post-training.

DeepSeek-R1 (671B MoE)

Total Params:671B

Active Params:37B

Release:January 2025

DeepSeek-R1 reasoning MoE built on DeepSeek-V3: 671B total / 37B activated per token, 128K context. Open weights on Hugging Face.

Qwen3 Coder 480B A35B

Total Params:480B

Active Params:35B

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

MiniMax-01

Total Params:456B

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...

MiniMax-M3 (428B MoE)

Total Params:428B

Active Params:23B

Release:June 2026

Native multimodal MiniMax MoE: ~428B total / ~23B active, 1M context, MiniMax Sparse Attention (MSA). Open weights on Hugging Face.

Hermes 4 405B

Total Params:405B

Hermes 4 is a large-scale reasoning model built on Meta-Llama-3.1-405B and released by Nous Research. It introduces a hybrid reasoning mode, where the model can choose to deliberate internally with...

Hermes 3 405B Instruct

Total Params:405B

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Llama 4 Maverick (400B MoE)

Total Params:400B

Active Params:17B

Release:April 2025

Meta Llama 4 Maverick open-weight MoE: 400B total / 17B active with long-context multimodal capability.

Qwen3.5 397B A17B

Total Params:397B

Active Params:17B

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...

DeepSeek-V4-Flash (284B MoE)

Total Params:284B

Active Params:13B

High-efficiency DeepSeek-V4 MoE variant: 284B total / 13B active with 1M context for lower-latency local inference.

Qwen3 VL 235B A22B Thinking

Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....

Qwen3 VL 235B A22B Instruct

Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...

Qwen3 235B A22B Thinking 2507

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...

Qwen3 235B A22B Instruct 2507

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...

Qwen3 235B A22B

Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and...

Context Window:66K tokens

Mixtral 8x22B Instruct

Total Params:176B

Active Params:44B

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

Microsoft

Context Window:66K tokens

WizardLM-2 8x22B

Total Params:176B

Active Params:44B

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is...

Mistral Medium 3.5

Total Params:128B

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

Devstral 2 2512

Total Params:123B

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...

Qwen3.5-122B-A10B

Total Params:122B

Active Params:10B

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

Mistral Small 4 (119B MoE)

Total Params:119B

Active Params:6.5B

Release:March 2026

Mistral Small 4 production MoE unifying instruction following, multimodal inputs, and agentic workflows with a low active footprint.

Cohere

Command A

Total Params:111B

Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding use cases. Compared to other leading proprietary...

Context Window:10M tokens

Llama 4 Scout (109B MoE)

Total Params:109B

Active Params:17B

Release:April 2025

Meta Llama 4 Scout: 109B MoE / 17B active with a native 10M-token context window for long-document workloads.

Qwen3 Coder Next

Total Params:80B

Active Params:10B

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...

Qwen3 Next 80B A3B Thinking

Total Params:80B

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...

Qwen3 Next 80B A3B Instruct

Total Params:80B

Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...

Qwen2.5 VL 72B Instruct

Total Params:72B

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

Qwen2.5 72B Instruct

Total Params:72B

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...

Hermes 4 70B

Hermes 4 70B is a hybrid reasoning model from Nous Research, built on Meta-Llama-3.1-70B. It introduces the same hybrid mode as the larger 405B release, allowing the model to either...

R1 Distill Llama 70B

Context Window:8K tokens

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across...

Llama 3.3 70B Instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...

Hermes 3 70B Instruct

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the...

Llama 3.1 70B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...

Mistral Large 3 2512

Total Params:41B

Active Params:5.1B

Mistral Large 3 2512 is Mistral’s most capable model to date, featuring a sparse mixture-of-experts architecture with 41B active parameters (675B total), and released under the Apache 2.0 license.

Qwen3.6 35B A3B

Total Params:35B

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...

Qwen3.5-35B-A3B

Total Params:35B

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

Qwen 3.6 35B-A3B (MoE)

Total Params:35B

Qwen 3.6 sparse MoE activating 3B parameters per token for efficient coding throughput.

Qwen3 VL 32B Instruct

Total Params:32B

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

Qwen3 32B

Total Params:32B

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

Qwen2.5 Coder 32B Instruct

Total Params:32B

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...

Gemma 4 31B

Total Params:31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Gemma 4 31B (Dense)

Total Params:31B

Google DeepMind Gemma 4 31B dense flagship for single-GPU / high-RAM consumer local inference.

Cohere

North Mini Code (free)

Active Params:3.8B

North Mini Code is Cohere's first agentic coding model and the debut of its North family. A sparse mixture-of-experts model with 30B total parameters and 3B active, it is optimized...

Qwen3 VL 30B A3B Thinking

Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...

Qwen3 VL 30B A3B Instruct

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

Qwen3 30B A3B Thinking 2507

Context Window:82K tokens

Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated...

Qwen3 Coder 30B A3B Instruct

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...

Qwen3 30B A3B Instruct 2507

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and...

Qwen3 30B A3B

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...

Qwen3.6 27B

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...

Qwen3.5-27B

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

Gemma 3 27B

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Gemma 2 27B

Context Window:8K tokens

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...

Qwen 3.6 27B (Dense)

Alibaba Qwen 3.6 dense 27B developer flagship for multilingual reasoning and structured local workflows.

Gemma 4 26B A4B

Total Params:26B

Active Params:4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Gemma 4 26B (MoE)

Total Params:26B

Active Params:3.8B

Ultra-efficient Gemma 4 sparse MoE activating ~3.8B parameters per token for fast local inference.

Voxtral Small 24B 2507

Context Window:32K tokens

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio...

Cognitive Computations

Uncensored

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...

Mistral Small 3.2 24B

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...

Mistral Small 3.1 24B

Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and...

Saba

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional...

Mistral Small 3

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed...

Ministral 3 14B 2512

Total Params:14B

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language...

Qwen3 14B

Total Params:14B

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

Microsoft

Phi 4

Total Params:14B

Context Window:16K tokens

[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion...

Llama Guard 4 12B

Total Params:12B

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

Gemma 3 12B

Total Params:12B

Mistral Nemo

Total Params:12B

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese,...

MiniMax M2.1

Total Params:10B

Context Window:205K tokens

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...

MiniMax M2

Total Params:10B

Context Window:205K tokens

MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,...

Qwen3.5-9B

Total Params:9B

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

Ministral 3 8B 2512

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

Qwen3 VL 8B Thinking

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...

Qwen3 VL 8B Instruct

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

Qwen3 8B

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...

Llama 3.1 8B Instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...

Cohere

Command R7B (12-2024)

Total Params:7B

Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning...

Qwen2.5 7B Instruct

Total Params:7B

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...

Gemma 3n 4B

Total Params:4B

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

Gemma 3 4B

Total Params:4B

Microsoft

Phi-4-mini (3.8B)

Total Params:3.8B

Release:February 2025

Microsoft Phi-4-mini dense model for fast on-device and low-RAM local text processing.

Ministral 3 3B 2512

Total Params:3B

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Llama 3.2 3B Instruct

Total Params:3B

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it...

Llama 3.2 1B Instruct

Total Params:1B

Context Window:60K tokens

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...