anthracite-org/magnum-v2-123b-gguf · Context Shift Problem

9 days ago

You have turned out a wonderful model and I thoroughly enjoy working with it. But there is one very serious problem: context shift does not work with this model in llamacpp. Very often there is a complete recalculation of the contents of the context window, which for such a large model and large context is very long, especially on weak GPUs. It seems that llamacpp cannot correctly compare current and past prompts. This does not happen with a pure Mistral Large 123B. Please try to get this problem sorted out - it's making it very difficult to work properly with your model.

lucyknada

Anthracite org 9 days ago

thanks! though models can't affect that, it must be somewhere in your inferencing frontend or lcpp/kcpp that is causing that re-processing of tokens.

lucyknada changed discussion status to closed 9 days ago

Vlad100

9 days ago

thanks! though models can't affect that, it must be somewhere in your inferencing frontend or lcpp/kcpp that is causing that re-processing of tokens.

Large chat with contextual window in 16k. Same settings in the Silly Tavern. Magnum-v2-123b-q4_k - often full context recalculation (not every replica, but often). Same model from Bartowsky and mradermacher - same problems. Bartowsky/Mistral-Large-Instruct-2407-Q2_K (pure) - no problems, full recalculation of context was not once in two hours of chat...

lucyknada

Anthracite org 9 days ago

we have someone looking into the config.json that might be different; we'll post if we find anything.

lucyknada changed discussion status to open 9 days ago

nisten

9 days ago

Hmmm, I can push out a quant with the og models config,

might be able to also fix it via editing the .ggufs metadata