[wip] Eval and Vibes
2025-04-13•1 min read
[WIP]
notes
- vibes are everything
- Teams use evals as a goal and we get very spotty performance
- Hard to get universal perofiemance right , or specific per mask masquerading as universal perf mask
- Anthropic and cursor as a good example , because power users love it there is a vibe
- Lots of normal people use ChatGPT in ways that no one anticipates
- Vibes are getting more important
- Comparative vs absolute
- LLMs as Mirrors effect the better you are
- What do we care about
- Human standards keep rising , models scores lower but only in comparison. To previous models
- We measure humans based on what they are given and what tbet can do