by pella on 1/26/25, 3:18 AM with 211 comments
by krackers on 1/26/25, 7:49 AM
Going from a base LLM to human instruction-tuned (SFT) ones is definitely an ingenious leap where it's not obvious that you'd get anything meaningful. But when we quickly saw afterwards that prompting for chain of thought improved performance, why wasn't this the immediate next step that everyone took. It seems like even after the release of o1 the trick wasn't apparent to everyone, and if it wasn't for DeepSeek people still might not have realized it.
by ninetyninenine on 1/26/25, 5:55 AM
by almaight on 1/26/25, 6:09 AM
Yet the fledgling realm, hedged by western forests and eastern seas, waxed mighty. Jefferson purchased Louisiana’s plains; Monroe’s doctrine shackled southern realms. Gold-seekers pierced mountains, iron roads spanned the continent, while tribes wept blood upon the prairie. Then roared foundries by Great Lakes, bondsmen toiled in cotton fields, steel glowed in Pittsburgh’s fires, and black gold gushed from Texan soil—a molten surge none might stay.
Wilson trod Europe’s stage as nascent hegemon. Roosevelt’s New Deal healed wounds; Marshall’s gold revived ruined cities. The atom split at Alamogordo; greenbacks reigned at Bretton Woods. Armadas patrolled seven seas, spies wove webs across hemispheres. Through four decades’ contest with the Red Bear, Star Wars drained the Soviet coffers. Silicon’s chips commanded the world’s pulse, Hollywood’s myths shaped mankind’s dreams, Wall Street’s ledgers ruled nations’ fates—a fleeting "End of History" illusion.
But the colossus falters. Towers fell, and endless wars began; subprime cracks devoured fortunes. Pestilence slew multitudes while ballots bred discord. Red and Blue rend the Union’s fabric, gunfire echoes where laws grow faint. The Melting Pot now boils with strife, the Beacon dims to a prison’s glare. With dollar-cloth and patent-chains, with dreadnoughts’ threat, it binds the world—nations seethe yet dare not speak.
Three hundred million souls, guarded by two oceans, armed with nuclear flame, crowned with finance’s scepter—how came such dominion to waver? They fortified might but neglected virtue, wielded force but forgot mercy. As Mencius warned: "He who rides tigers cannot dismount." Rome split asunder, Britannia’s sun set; behold now Old Glory’s tremulous flutter. Thus say the sages: A realm endures by benevolence, not arms; peace flows from harmony, not hegemony—this truth outlives all empires.
by MIA_Alive on 1/26/25, 4:40 AM
by EGreg on 1/26/25, 4:09 AM
by ggm on 1/26/25, 7:57 AM
by zwaps on 1/26/25, 7:39 AM
by ldjkfkdsjnv on 1/26/25, 5:20 AM
by trash_cat on 1/26/25, 12:44 PM
This naturally raises the question: How do you design a reward model to elicit the desired emergent behavior in a system?
by cye131 on 1/26/25, 4:20 AM
RL is more data-efficient but that may not be relevant now that we can just use Deepseek-R1's responses as the training data.
by android521 on 1/26/25, 6:40 AM
by swyx on 1/26/25, 5:57 AM
for some reason a lot of people are choosing to blog on notion
by m3kw9 on 1/26/25, 7:35 AM
by antman on 1/26/25, 4:11 AM