microsoft/Phi-3.5-mini-instruct · Why not include MedQA in your benchmarks?

30 days ago

•

It's one of the good reasoning benchmarks built on USMLE questions. This benchmark was included in phi-3 and its June update so it makes sense to include it in phi-3.5 benchmarks no?

Thanks for the model and all your work too!

nguyenbh

Microsoft org 30 days ago

•

edited 30 days ago

Thank you for your interest in the Phi-3.5 models! We did benchmark MedQA 🩺 but we will let the community to run this benchmark by themself (hint: we think the Phi-3.5 MoE and Mini are very competitive 🌞)

Hugman2345

29 days ago

•

edited 29 days ago

It's great and competes with much bigger models on USMLE/Medical questions, information and reasoning. In this area, phi-3.5 is better than other 7b,8b,9b competitors and phi-3.5's bigger context size is a plus, sadly it feels like it doesn't beat Phi-3-small-8k and Phi-3-medium-4k in this particular area. This is just from first impressions and needs to be confirmed by others. Definitely so much better than other tiny models it's not even remotely close.

Thanks for Phi-3.5, I don't know how such a small model is even close to the level of big models.

nguyenbh

Microsoft org 28 days ago

•

edited 25 days ago

@Hugman2345 Thank you for your effort on independently benchmarking the Phi-3.5 models on MedQA. It is great to see that the models perform within our expectation.