Marginal Intelligence

notes

vibes are everything
Teams use evals as a goal and we get very spotty performance
Hard to get universal perofiemance right , or specific per mask masquerading as universal perf mask
Anthropic and cursor as a good example , because power users love it there is a vibe
Lots of normal people use ChatGPT in ways that no one anticipates
Vibes are getting more important
Comparative vs absolute
LLMs as Mirrors effect the better you are
What do we care about
Human standards keep rising , models scores lower but only in comparison. To previous models
We measure humans based on what they are given and what tbet can do

[wip] Eval and Vibes