Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs
28 points - today at 6:12 AM
SourceComments
ssgodderidge today at 10:03 AM
The example model in the documentation is 4o-mini, you might want to update that to a more recent model.
As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?
egeozcan today at 9:20 AM
Are there any published results gathered using this?
ianhxu today at 10:08 AM
How do you iterate on the judge prompt? Is there an auto rater?
bixxie09 today at 11:02 AM
[dead]
huflungdung today at 8:10 AM
[dead]