Praise and Criticism

#23
by ChuckMcSneed - opened

Praise

You've cooked really hard with this one. While Mixtral 8x22b felt a bit like a sidegrade to Miqu(Mistral-Medium), this one is definitely an upgrade.

Intelligence

This model feels really intelligent, able to pick up the most subtle details. Makes it worth it over Command-r-plus. Really great at coding too.

No positivity bias

While Miqu was overly positive and never dared to do or say anything bad, even if it was in character, this one can, which allows the stories to be more immersive.

Not castrated like LLAMA

LLAMA 3+ was filtered to hell in the base, which made it outright unusable to me and many others. It's okay as an assistant, but no more beyond that. Largestral btfos even 405b due to this, in my opinion.

Criticism

Now, the flaws.

GPTisms

Lots of overused phrases from GPT4 training data, not as bad as with some other models, but still bad. Do you know why people liked Command-r-plus? Because it had none of those, it felt different. Humans really hate GPTisms, they are like a big, fat sign saying "look, I'm ChatGPT!", feels really soulless. Nobody likes that shiver slop.

Repetition

A minor flaw solved with samplers, but it feels like the model picks up patterns a bit too quickly? Not a very big problem, just an observation.

Overfitting

That's one of the features of your past models. I don't know why you do it, but I had to pull up the temperature to 3 to break overfitted sentences, which I shouldn't be doing.

Mistral AI_ org

@ChuckMcSneed Hi!

Thank you so much for all the feedback!

Could you share a bit more on the overfitted aspect? What sentences do you found overfitted?

@pandora-s Mostly phrases which are overused by OpenAI's GPT models. Here are some:

a mix of X and Y
Ah,
tapestry
mischievous
smirk
chuckle
husky voice
barely above a whisper
couldn't help but
shivers
maybe, just maybe
a testament to
cold and calculating
growl
lean in
LOTS of phrases with eyes
Mistral AI_ org

I see, could you also share some feedback related to those "gptisms"? 🤔

"bustling" has now become my least favorite word

That being said, this is my favorite open weights model now.

Mistral AI_ org

Thanks a lot for all the feedback, if any other issues feel free to share! We are always open to feedback to improve our models!

Just to add some more to the praise side of things here, I really appreciate that this model just zeroes in on what you tell it to do. No fluff, no introducing things when you don't ask it to, and not much in the way of convoluted wrapping up sentences either. At the same time, if you ask it to write at length, it will do so, and in a focused way. This flexibility is a real achievement. I have a bunch of hard, long context tasks and this is the first open weights model to nail them all. It's the best locally runnable model, IMO. Llama 3 and 3.1 might edge it in some benchmarks but in actual, everyday use, it's top dog.

...ditto... this model has replaced Cohere CR+ as my daily driver.. More precise than CR+ in the early context depth, but it does fall off quicker than CR+ after 32k+ (my own impressions verified by RULER). Follows instructions better than CR+
I agree with @ChuckMcSneed that some sampler judo is needed to squash some of the less desirable aspects, but it at least has the capability to do so. CR+, you could neutralize all samplers but temp at 1 and get something fresh and coherent over and over with the same opening prompt.
All in all well done and the best open-weight model on the block.

Sign up or log in to comment