Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

28 points - today at 6:12 AM

Source

Comments

ssgodderidge today at 10:03 AM
The example model in the documentation is 4o-mini, you might want to update that to a more recent model.

As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?

egeozcan today at 9:20 AM
Are there any published results gathered using this?
ianhxu today at 10:08 AM
How do you iterate on the judge prompt? Is there an auto rater?
bixxie09 today at 11:02 AM
[dead]
huflungdung today at 8:10 AM
[dead]