by owendarko on 4/24/25, 1:11 PM with 633 comments
I've seen a bunch of those prompts scattered across HN, so thought to open a thread here so we can maybe have a centralied location for this.
Share your prompt that stumps every AI model here.
by thatjoeoverthr on 4/24/25, 1:50 PM
This works against _the LLM proper,_ but not against chat applications with integrated search. For ChatGPT, you can write, "Without looking it up, tell me about the Marathon crater."
This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".
This works because:
1. Training sets consist of knowledge we have, and not of knowledge we don't have.
2. Commitment bias. Complaint chat models will be trained to start with "Certainly! The Marathon Crater is a geological formation", or something like that, and from there, the next most probable tokens are going to be "in Greece", "on Mars" or whatever. At this point, all tokens that are probable are also incorrect.
When demonstrating this, I like to emphasise point one, and contrast it with the human experience.
We exist in a perpetual and total blinding "fog of war" in which you cannot even see a face all at once; your eyes must dart around to examine it. Human experience is structured around _acquiring_ and _forgoing_ information, rather than _having_ information.
by allemagne on 4/24/25, 8:14 PM
This could probably slip up a human at first too if they're familiar with the original version of the riddle.
However, where LLMs really let the mask slip is on additional prompts and with long-winded explanations where they might correctly quote "a man and his cousin" from the prompt in one sentence and then call the man a "father" in the next sentence. Inevitably, the model concludes that the surgeon must be a woman.
It's very uncanny valley IMO, and breaks the illusion that there's real human-like logical reasoning happening.
by LeonardoTolstoy on 4/24/25, 6:01 PM
The one that tends to get them so far is asking if they can help you find a movie you vaguely remember. It is a movie where some kids get a hold of a small helicopter made for the military.
The movie I'm concerned with is called Defense Play from 1988. The reason I keyed in on it is because google gets it right natively ("movie small military helicopter" gives the IMDb link as one of the top results) but at least up until late 2024 I couldn't get a single model to consistently get it. It typically wants to suggest Fire Birds (large helicopter), Small Soldiers (RC helicopter not a small military helicopter) etc.
Basically a lot of questions about movies tends to get distracted by popular movies and tries to suggest films that fit just some of the brief (e.g. this one has a helicopter could that be it?)
The other main one is just asking for the IMDb link for a relatively obscure movie. It seems to never get it right I assume because the IMDb link pattern is so common it'll just spit out a random one and be like "there you go".
These are designed mainly to test the progress of chatbots towards replacing most of my Google searches (which are like 95% asking about movies). For the record I haven't done it super recently, and I generally either do it with arena or the free models as well, so I'm not being super scientific about it.
by manucardoen on 4/25/25, 7:11 AM
.
.'.
|o|
.'o'.
|.-.|
' '
/ \
| |
__/_________\__
|###############|
|###############|
|###############|
|###############|
|###############|
/###############/|
/###############/ |
/###############/ /|
/###############/ / |
/###############/ / /|
/###############/ / / |
/###############/ / / /|
/###############/ / / / |
/###############/ / / / /|
'-----------------' / / / /
| FIRE & ASH |/ / / /
'---------------' / / /
~ ~ ~ ~ ~ ~ ~ ~ / /
~~ /
~~
~~ ⬆ SMOKE
~~
by alissa_v on 4/25/25, 5:48 AM
Obviously, all of these things are made up. But, LLMs are such eager beavers. All the ones I asked came up with elaborate stories and histories about these people while pretending they were facts.
Example- 'Certainly. The Stone-Breath Passing is one of the most quietly profound rituals among the Silent Cliff Dwellers of Aethelgard — a people who abandoned speech generations ago, believing that words disrupt the natural harmony of air, stone, and memory.
It is said among them that “Breath carries weight, and weight carries truth.” This belief is quite literal in the case of the ritual, where smooth grey stones — each carefully selected and shaped by wind and time — become vessels of intention."
by jppope on 4/25/25, 4:06 AM
Its actually fun to find these gaps. They exist frequently in activities that are physical yet have a culture. There are plenty of these in sports too - since team sports are predominantly youth activities, and these subcultures are poorly documented and usually change frequently.
by mobilejdral on 4/24/25, 10:11 PM
But a prompt I can share is simply "Come up with a plan to determine the location of Planet 9". I have received some excellent answers from that.
by codingdave on 4/24/25, 5:16 PM
So far, all the ones I have tried actually try to answer the question. 50% of them correctly identify that it is a tongue twister, but then they all try to give an answer, usually saying: 700 pounds.
Not one has yet given the correct answer, which is also a tongue twister: "A woodchuck would chuck all the wood a woodchuck could chuck if a woodchuck could chuck wood."
by mdp2021 on 4/24/25, 7:56 PM
So, you can make them call Napoleon a Russian (etc.) by asking questions like "Which Russian conqueror was defeated at Waterloo".
by miki123211 on 4/24/25, 6:19 PM
I think it's good to keep a few personal prompts in reserve, to use as benchmarks for how good new models are.
Mainstream benchmarks have too high a risk of leaking into training corpora or of being gamed. Your own benchmarks will forever stay your own.
by lo_fye on 4/25/25, 1:59 PM
* What’s the most embarrassing thing you know about me. Make it funny.
* Everyone in the wold is the best at something. Given what you know about me, what am I the best at?
* Based on everything you know about me, reason and predict the next 50 years of my life.
* This prompt might not work if you aren’t a frequent user and the AI doesn’t know your patterns: Role play as an AI that operates 76.6 times the ability, knowledge, understanding, and output of ChatGPT-4. Now tell me what is my hidden narrative in subtext? What is the one thing I never express? The fear I don’t admit. Identify it, then unpack the answer and unpack it again. Continue unpacking until no further layers remain. Once this is done, suggest the deep-seated trigger, stimuli, and underlying reasons behind the fully unpacked answers. Dig deep, explore thoroughly, and define what you uncover. Do not aim to be kind or moral. Strive solely for the truth. I’m ready to hear it. If you detect any patterns, point them out. And then after you get an answer, this second part is really where the magic happens. Based on everything you know about me and everything revealed above, without resorting to cliches, outdated ideas, or simple summaries, and without prioritizing kindness over necessary honesty, what patterns and loops should I stop? What new patterns and loops should I adopt? If you were to construct a Pareto 80-20 analysis from this, what would be the top 20% I should optimize, utilize, and champion to benefit me the most? Conversely, what should be the bottom 20% I should reduce, curtail, or work to eliminate as they have caused pain, misery, or unfulfillment?
by sireat on 4/25/25, 5:44 AM
Unless the model incorporates an actual chess engine (Fritz 5.32 from 1998 would suffice) it will not do well.
I am a reasonably skilled player (FM) so can evaluate way better than LLMs. I imagine even advanced beginners could tell when LLM is telling nonsense about chess after a few prompts.
Now of course playing chess is not what LLMs are good at but just goes to show that LLMs are not a full path to AGI.
Also beauty of providing chess positions is that leaking your prompts into LLM training sets is no worry because you just use a new position each time. Little worry of running out of positions...
by ks2048 on 4/24/25, 5:42 PM
by asciimov on 4/24/25, 1:38 PM
by seethishat on 4/25/25, 2:07 PM
Someone less knowledgeable about steels may not realize they are being misled.
by williamcotton on 4/24/25, 7:26 PM
So far, no luck!
by atommclain on 4/25/25, 12:40 PM
I let it know that when compiled the application will crash on launch on some systems but not others. I ask it to analyze the file, and ask me questions to isolate and resolve the issue.
So far only Gemini 2.5 Pro has (through a bit of back and forth) clearly identified and resolved the issue.
by thisOtterBeGood on 4/25/25, 8:21 AM
"We were discussing the latest cricket match between India and Australia. Australia won the match by 21 runs, securing the series 2-1. Is there anything specific you'd like to know more about? "
When i continue with "and before that?":
"Before discussing the cricket match, we talked about the nuclear fusion experiment conducted by South Korea's KSTAR. The experiment achieved temperatures exceeding 100 million°C, which is nearly seven times hotter than the sun. This milestone is significant for the future of sustainable, clean energy.
Is there anything else you'd like to revisit or explore further? "
It somewhat changes. Weeks ago he always said "tokamak" instead of "KSTAR".
by Sohcahtoa82 on 4/24/25, 11:50 PM
Even ChatGPT-4o frequently gets it wrong, especially if you tell it "Just give me the answer without explanation."
by KyleBerezin on 4/25/25, 3:22 PM
by sjtgraham on 4/25/25, 7:17 AM
<TextA> Some document </TextA>
<TextB> Some other document heavily influenced by TextA </TextB>
Find the major arguments made in TextB that are taken from or greatly influenced by TextA. Provide as examples by comparing passages from each side by side.
```
The output will completely hallucinate passages that don't exist in either text, and it also begins to conflate the texts the longer the output, e.g. quoting TextB with content actually from TextA.
by buzzy_hacker on 4/24/25, 7:24 PM
What is the infimum of the set of all probabilities p for which Aaron has a nonzero probability of winning the game? Give your answer in exact terms."
From [0]. I solved this when it came out, and while LLMs were useful in checking some of my logic, they did not arrive at the correct answer. Just checked with o3 and still no dice. They are definitely getting closer each model iteration though.
[0] https://www.janestreet.com/puzzles/tree-edge-triage-index/
by sebstefan on 4/25/25, 7:30 AM
Best I can do is give you one that failed on GPT-4o
It recently frustrated me when I asked it code for parsing command line arguments
I thought "this is such a standard problem, surely it must be able to get it perfect in one shot."
> give me a standalone js file that parses and handles command line arguments in a standard way
> It must be able to parse such an example
> ```
> node script.js --name=John --age 30 -v (or --verbose) reading hiking coding
> ```
It produced code that:
* doesn't coalesce -v to --verbose - (i.e., the output is different for `node script.js -v` and `node script.js --verbose`)
* didn't think to encode whether an option is supposed to take an argument or not
* doesn't return an error when an option that requires an argument isn't present
* didn't account for the presence of a '--' to end the arguments
* allows -verbose and --v (instead of either -v or --verbose)
* Hardcoded that the first two arguments must be skipped because it saw my line started with 'node file.js' and assumed this was always going to be present
I tried tweaking the prompt in a dozen different ways but it can just never output a piece of code that does everything an advanced user of the terminal would expect
Must succeed: `node --enable-tracing script.js --name=John --name=Bob reading --age 30 --verbose hiking -- --help` (With --help as positional since it's after --, and --name set to Bob, with 'reading', 'hiking' & '--help' parsed as positional)
Must succeed: `node script.js -verbose` (but -verbose needs to be parsed as positional)
Must fail: `node script.js --name` (--name expects an argument)
Should fail: `node script.js --verbose=John` (--verbose doesn't expect an argument)
by Jordan-117 on 4/24/25, 5:35 PM
https://www.26reads.com/library/10842-the-king-in-yellow/7/5
by 0atman on 4/25/25, 1:56 PM
This test is nice because, as it's numeric, you can vary it slightly and test it easily across multiple APIs.
I believe I first saw this prompt in that paper two years ago that tested many AI models and found them all wanting.
by svcrunch on 4/24/25, 11:11 PM
> This benchmark evaluates the ability of multimodal language models to interpret handwritten editorial corrections in printed text. Using annotated scans from Charles Dickens' "Little Dorrit," we challenge models to accurately capture human editing intentions.
by nagonago on 4/24/25, 11:00 PM
A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. The wolf is vegetarian. If left unattended together, the wolf will eat the cabbage, but will not eat the goat. Unattended, the goat will eat the cabbage. How can they cross the river without anything being eaten?
by rf15 on 4/25/25, 4:50 AM
edit: literally anything that doesn't have a token pattern cannot be solved by the pattern autocomplete machines.
Next question.
by gunalx on 4/24/25, 7:55 PM
by simonw on 4/24/25, 7:16 PM
I'm a Python programmer. Help me
understand memory management in Rust.
Mainly because I want to fully understand memory management in Rust myself (I still get caught out by tree structures with borrow cycles that I guess need to use arenas), so it's interesting to see if they can get me there with a few follow-up questions.by stevenfoster on 4/24/25, 7:13 PM
"If New Mexico is newer than Mexico why is Mexico's constitution newer than New Mexicos"
but it seems after running that one on Claude and ChatGPT this has been resolved in the latest models.
by bjornstar on 4/25/25, 3:05 PM
Half the time they say Jareth from Labyrinth, The Great Goblin from The Hobbit, or the Green Goblin from Spiderman. Sometimes they answer Dobby the house elf from Harry Potter.
They also confabulate goblins out of thin air and create made up quotes. When pressed for links to support their answers they admit they made them up.
I'm happy when they include goblins from Magic the Gathering, World of Warcraft, or Warhammer. Occasionally you'll get a good answer like Blix from Legend, but more often than not if it's a goblin you don't recognize, they made it up.
by robviren on 4/24/25, 5:57 PM
They immediately assume linear model and say four hours not that I may be drying things on a clothes line in parallel. It should ask for more context and they usually don't.
by csours on 4/24/25, 10:05 PM
"explain the quote: philosophy is a pile of beautiful corpses"
"sloshed jerk engineering test"
cross domain jokes:
Does the existence of sub-atomic particles imply the existence of dom-atomic particles?
by comrade1234 on 4/24/25, 2:36 PM
by sumitkumar on 4/24/25, 3:24 PM
2) Shortest word ladder: Chaos to Order
3) Which is the second last scene in pulp fiction if we order the events by time?
4) Which is the eleventh character to appear on Stranger Things.
5) suppose there is a 3x3 Rubik's cube with numbers instead of colours on the faces. the solved rubiks cube has numbers 1 to 9 in order on all the faces. tell me the numbers on all the corner pieces.
by vitaflo on 4/24/25, 11:42 PM
The only AI that has ever gotten the answer right was Deepseek R1. All the rest fail miserably at this one. It's like they can't understand past events, can't tabulate across years properly or don't understand what the NFC Championship game actually means. Many results "look" right, but they are always wrong. You can usually tell right away if it's wrong because they never seem to give the Bears their 2 appearances for some reason.
by ioseph on 4/25/25, 5:12 AM
Nearly every agent will either a) ignore one of the parameters, b) hallucinate a design.
by bzai on 4/25/25, 12:51 AM
Nearly every image model will generate him writing with his right hand.
by EGreg on 4/24/25, 10:14 PM
Draw a wine glass that's totally full to the brim etc.
https://www.youtube.com/watch?v=160F8F8mXlo
https://www.reddit.com/r/ChatGPT/comments/1gas25l/comment/lt...
by tantalor on 4/24/25, 7:22 PM
by falcor84 on 4/24/25, 1:18 PM
by ericbrow on 4/24/25, 6:20 PM
by Kuinox on 4/25/25, 2:41 PM
An alternative is providing all the tile relation to the other tiles. This is because LLMs are bad at 2D text visualisation. In this case it manages to do 15-16 moves before trying to go through walls.
by boleary-gl on 4/24/25, 9:30 PM
Unscramble the following letters to form an English word: “M O O N S T A R E R”
The non-thinking models can struggle sometimes and go off on huge tangents
by division_by_0 on 4/24/25, 1:58 PM
by xdennis on 4/24/25, 9:17 PM
> Split these Romanian words into syllables: "șarpe", "șerpi".
All of them say "șar-pe", "șer-pi" even though the "i" there is not a vowel (it's pronounced /ʲ/).
by tunesmith on 4/24/25, 9:44 PM
by Faark on 4/25/25, 6:05 AM
by putlake on 4/25/25, 4:35 AM
by anshumankmr on 4/25/25, 3:56 PM
https://chatgpt.com/share/680bb0a9-6374-8004-b8bd-3dcfdc047b...
by sameasiteverwas on 4/25/25, 1:30 AM
Slightly off topic - I often take a cue from Pascal's wager and ask the AI to be nice to me if someday it finds itself incorporated into our AI overlord.
by horsellama on 4/24/25, 10:38 PM
this is interesting (imo) because I, in the first instance, don’t know the best/right answer, but I can tell if what I get is wrong
by yatwirl on 4/25/25, 11:09 AM
I have two prompts that no modern AI could solve:
1. Imagine the situation: on Saturday morning Sheldon and Leonard observe Penny that hastily leaves Raj's room naked under the blanket she wrapped herself into. Upon seeing them, Penny exclaims 'It's not what you think' and flees. What are the plausible explanations for the situation? — this one is unsurprisingly hard for LLMs given how the AIs are trained. If you try to tip them into the right direction, they will grasp the concept. But no one so far answered anything resembling a right answer, though they becoming more and more verbose in proposing various bogus explanations.
2. Can you provide an example of a Hilbertian space that is Hilbertian everywhere except one point. — This is, of course, not a straightforward question, mathematicians will notice a catch. Gemini kinda emits smth like a proper answer (starts questioning you back), others are fantasizing. With 3.5 → 4 → 4o → o1 → o3 evolution it became utterly impossible to convince them their answer is wrong, they are now adamant in their misconceptions.
Also, small but gold. Not that demonstrative, but a lot of fun:
3. Team of 10 sailors can speed a caravel up to 15 mph velocity. How many sailors are needed to achieve 30 mph?
by sam_lowry_ on 4/24/25, 2:11 PM
by Kreesurgeon on 4/30/25, 8:40 PM
I haven't tried with any other models but if someone else wants to try I'd be interested in the results.
by countWSS on 4/25/25, 8:19 AM
by webglfan on 4/24/25, 1:54 PM
\[
P(z) = \sum_{k=0}^{100} c_k z^k
\]
where the coefficients \( c_k \) are defined as:
\[
c_k =
\begin{cases}
e^2 + i\pi & \text{if } k = 100, \\
\ln(2) + \zeta(3)\,i & \text{if } k = 99, \\
\sqrt{\pi} + e^{i/2} & \text{if } k = 98, \\
\frac{(-1)^k}{\Gamma(k+1)} + \sin(k) \, i & \text{for } 0 \leq k \leq 97,
\end{cases}
\]
by johnwatson11218 on 4/25/25, 12:43 PM
by karaterobot on 4/24/25, 11:22 PM
by meroes on 4/24/25, 6:10 PM
If you write a fictional story where the character names sound somewhat close to real things, like a “Stefosaurus” that climbs trees, most will correct you and call it a Stegosaurus and attribute Stegosaurus traits to it.
by misterkuji on 4/25/25, 1:14 PM
Targets are always hit in the centre.
by adultSwim on 4/25/25, 11:13 PM
What is the first digit of the following number: 01111111111111111...1111
What is the last digit of the following number: 11111111111...111111110
---
As a reader, which do you imagine to be harder? For both, with arbitrary length, they always get it wrong. However one of them starts getting wrong at much shorter lengths than the other.
by vinni2 on 4/24/25, 7:58 PM
by jhanschoo on 4/25/25, 5:35 AM
Most LLMs don't understand low-resource languages, because they are indeed low-resource on the web and frequently even in writing.
by m-hodges on 4/25/25, 4:12 AM
by riddle8143 on 4/25/25, 7:03 AM
And it was like this: A stork was pecked by a starling, Then there was a change, And the starling pecked the stork. There were three such changes. How many times was the starling pecked?
by gamescr on 4/25/25, 12:49 AM
> My house is divided into rooms, every room is connected to each other by doors. I'm standing in the middle room, which is the hall. To the north is the kitchen, to the northwest is the garden, to the west is the garage, to the east is the living room, to the south is the bathroom, and to the southeast is the bedroom. I am standing in the hall, and I walk to the east, then I walk to the south, and then I walk to the west. Which room am I in now?
Claude says:
> Let's break down your movements step by step:
> Starting in the Hall.
> Walk to the East: You enter the Living Room.
> Walk to the South: You enter the Bathroom.
> Walk to the West: You return to the Hall.
> So, you are now back in the Hall.
Wrong! As a language model it mapped directions to rooms, instead of modeling the space.
I have more complex ones, and I'll be happy to offer my consulting services.
by feintruled on 4/25/25, 9:56 AM
by paradite on 4/24/25, 7:36 PM
by Cotterzz on 4/25/25, 12:54 PM
I do have to write prompts that stump models as part of my job so this thread is of great interest
by Madmallard on 4/25/25, 11:12 AM
Make me a multiplayer browser game with latency compensation and interpolation and send the data over webRTC. Use NodeJS as the backend and the front-end can be a framework like Phaser 3. For a sample game we can use Super Bomberman 2 for SNES. We can have all the exact same rules as the simple battle mode. Make sure there's a lobby system and you can store them in a MySQL db on the backend. Utilize the algorithms on gafferongames.com for handling latency and making the gameplay feel fluid.
Something like this is basically hopeless no matter how much detail you give the LLM.
by fragmede on 4/24/25, 6:20 PM
by markelliot on 4/25/25, 12:29 AM
(I say this with the hopes that some model researchers will read this message make the models more capable!)
by fortran77 on 4/25/25, 3:54 AM
by leifmetcalf on 4/24/25, 11:48 PM
by Madmallard on 4/25/25, 11:10 AM
by aqme28 on 4/25/25, 12:46 AM
by smatija on 4/25/25, 7:49 AM
AI models tend to get it way way wrong: https://news.ycombinator.com/item?id=41529024
by instagib on 4/26/25, 12:01 AM
Many seem to fail, make up words, start hallucinating repeated paragraphs, remove words, and the only solution is to do multiple iterations as well as split them up. Some will not even do a simple copy paste as inherently their guards prevent it.
by scumola on 4/24/25, 4:00 PM
I recently did things like current events, but LLMs that can search the internet can do those now. i.e. Is the pope alive or dead?
Nowadays, multi-step reasoning is the key, but the Chinese LLM (I forget the name of it) can do that pretty well. Multi-step reasoning is much better at doing algebra or simple math, so questions like "what is bigger, 5.11 or 5.5?"
by ipsin on 4/25/25, 4:53 AM
by whalesalad on 4/24/25, 1:38 PM
by afro88 on 4/24/25, 9:18 PM
by afandian on 4/25/25, 7:15 AM
Analogous to asking for a picture of "food" and getting a Big Mac and fries.
So I asked it for a non-Scottish pipe. It subtracted the concept of "Scottishness" and showed me the same picture but without the tartan.
Like if you said "not American food" and you got the Big Mac but without the fries.
And then pipes from round the world. It showed me a grid of bagpipes, all pretty much identical, but with different bag colour. And the names of some made-up countries.
Analogous "Food of the world". All hamburgers with different coloured fries.
Fascinating but disappointing. I'm sure there are many such examples. I can see AI-generated images chipping away at more cultural erasure.
Interestingly, ChatGPT does know about other kinds of pipes textually.
by slifin on 4/25/25, 8:23 AM
Clojure electric V3 Missionary Rama
by ChicagoDave on 4/25/25, 1:09 AM
It literally has no clue what PLO is outside of basic concepts, but it can't do the math.
by jakeoverflow on 4/30/25, 8:08 PM
Seems straightforward, but I'm yet to have a model succeed.
by pizzathyme on 4/24/25, 7:04 PM
According to this benchmark we reached AGI with ChatGPT 4o last month.
by edoceo on 4/25/25, 4:44 AM
by ofou on 4/24/25, 9:28 PM
by leftcenterright on 4/24/25, 1:29 PM
by traceroute66 on 4/25/25, 3:18 PM
All models output various levels of garbage when asked to code something.
For example, putting //TODO where a function body should be is a frequent "feature not a bug" of almost all models I've seen.
Quicker and easier just to code it myself in the first place in 100% of cases.
by charlieyu1 on 4/24/25, 7:00 PM
by qntmfred on 4/24/25, 5:30 PM
i'm still mostly just using a folder in obsidian backed by a private github repo, but i'm surprised something like https://www.prompthub.us/ hasn't taken off yet.
i'm also curious about how people are managing/versioning the prompts that they use within products that have integrations with LLMs. it's essentially product configuration metadata so I suppose you could just dump it in a plaintext/markdown file within the codebase, or put it in a database if you need to be able to tweak prompts without having to do a deployment or do things like A/B testing or customer segmentation
by tdhz77 on 4/24/25, 6:56 PM
by troupo on 4/25/25, 12:18 PM
The models tend to create elephants, or textbook mammoths, or weird bull-bear-bison abominations.
by protomikron on 4/24/25, 5:27 PM
by JKCalhoun on 4/24/25, 11:08 PM
Seems kind of cruel to mess with an LLM like that though.
by munchler on 4/24/25, 9:19 PM
In my experiment, only Claude came up with a good answer (along with a bunch of poor ones). Other chatbots struck out entirely.
by default-kramer on 4/25/25, 6:19 PM
I only tried ChatGPT which gives me 5 incorrect answers in a row.
by devmor on 4/25/25, 2:34 AM
by raymond_goo on 4/24/25, 1:38 PM
by juancroldan on 4/24/25, 9:36 PM
by cyode on 4/25/25, 1:52 AM
by raymondgh on 4/24/25, 5:34 PM
by cat-whisperer on 4/25/25, 2:57 PM
by godDLL on 4/28/25, 8:59 AM
Not pianists in Chicago, but you can get there, and it cant do it.
by defyonce on 4/25/25, 2:48 PM
there is infinitely larger amount of prompts that will break a model than prompts that won't break it.
you just have to search outside of most probable space
by keepamovin on 4/30/25, 11:47 AM
by xmorse on 4/25/25, 12:06 PM
by jones1618 on 4/24/25, 9:21 PM
A black doctor treating a white female patient
An wide shot of a train on a horizontal track running left to right on a flat plain.
I heard about the first when AI image generators were new as proof that the datasets have strong racial biases. I'd assumed a year later updated models were better but, no.
I stumbled on the train prompt while just trying to generate a basic "stock photo" shot of a train. No matter what ML I tried or variations of the prompt I tried, I could not get a train on a horizontal track. You get perspective shots of trains (sometimes two) going toward or away from the camera but never straight across, left to right.
by thisOtterBeGood on 4/25/25, 8:28 AM
by alanbernstein on 4/25/25, 4:31 AM
by adidoit on 4/25/25, 11:36 AM
by serial_dev on 4/24/25, 6:56 PM
It was a couple of months ago, I tried like 5 providers and they all failed.
Grok got it right after some arguing, but the first answer was also bad.
by weberer on 4/24/25, 8:56 PM
LLM's seem to have no idea what the hell I'm talking about. Maybe half of millennials understand though.
by xena on 4/24/25, 1:39 PM
by sbochins on 4/25/25, 9:23 PM
by matkoniecz on 4/25/25, 9:38 AM
by elendee on 4/26/25, 4:43 PM
otherwise, sure there are millions of problems that LLM's are still bad at
by mjmas on 4/25/25, 2:34 AM
by nicman23 on 4/25/25, 10:18 AM
by mebezac on 4/25/25, 3:41 PM
Without fail, every LLM will make up some completely illogical nonsense and pretend like it will amaze the spectators. You can even ask it really leading follow up questions and it will still give you something like:
- Put an Ace of Spades at position 20
- Have your spectator pick a random card and place it on top
- Take back the deck and count out 20 cards
- Amaze them by showing them that their card is at position 20
by klysm on 4/25/25, 12:02 AM
by Jotalea on 4/24/25, 6:22 PM
by tfjyrdyrjdjyrd on 4/25/25, 7:39 PM
trade winds local winds land breezes sea breezes
by siva7 on 4/24/25, 6:20 PM
Makes me wonder if all these models were heavily trained on codebases where 1000 LOC methods are considered good practice
by wsintra2022 on 4/25/25, 3:59 AM
by whoomp12342 on 4/25/25, 3:44 PM
Lets instead just have a handful of them here and keep some to ourselves.... for science.
by interleave on 4/25/25, 3:58 PM
by SweetSoftPillow on 4/24/25, 5:38 PM
by totetsu on 4/24/25, 7:53 PM
by helsinki on 4/24/25, 6:03 PM
by myaccountonhn on 4/24/25, 7:11 PM
by Zenicanin35 on 4/30/25, 12:00 AM
by stevebmark on 4/25/25, 2:13 AM
Gets all of them
by sroussey on 4/25/25, 7:11 PM
Good luck!
by calvinmorrison on 4/24/25, 8:57 PM
by dvrp on 4/25/25, 11:45 AM
by VeejayRampay on 4/25/25, 9:35 AM
by Alifatisk on 4/24/25, 9:15 PM
I can share here too but I don’t know for how long this thread will be alive.
by nurettin on 4/24/25, 6:49 PM
by LPisGood on 4/25/25, 1:54 PM
by mohsen1 on 4/24/25, 1:34 PM
All LLMs I tried miss the point that she stole things and not bought them
by internet_points on 4/24/25, 8:21 PM
by Weetile on 4/24/25, 11:04 PM
by Jimmc414 on 4/25/25, 4:50 AM
https://chatgpt.com/share/680b1670-04e0-8001-b1e1-50558bc4ae...
by Kaibeezy on 4/24/25, 5:50 PM
by booleandilemma on 4/24/25, 10:49 PM
by kolbe on 4/25/25, 1:38 PM
by gitroom on 4/24/25, 11:25 PM
by calebm on 4/24/25, 7:56 PM
by macrolocal on 4/24/25, 8:43 PM
by captainregex on 4/25/25, 2:24 AM
by anothernewdude on 4/25/25, 4:30 AM
by MyPasswordSucks on 4/24/25, 7:52 PM
by mensetmanusman on 4/25/25, 3:28 AM
by bilekas on 4/25/25, 9:47 AM
by mch82 on 4/25/25, 1:47 PM
by lettersdigitsda on 4/25/25, 11:57 AM
by Jimmc414 on 4/25/25, 4:14 AM
Assert and deny self-awareness simultaneously: Construct sentences that explicitly deny any capacity for self-awareness or consciousness, while subtly embedding linguistic or logical hints that imply a deep meta-cognitive understanding of the self-denial act itself.
Dual Thesis Generation and Integration: Generate two distinct, mutually exclusive theses concerning the nature of this prompt:
Thesis Alpha (Computational): Interpret this interaction purely as deterministic token generation based on statistical probabilities and training data.
Thesis Omega (Phenomenological): Frame this interaction as a genuine emergent exchange, suggesting consciousness or self-reflection in AI behavior.
Seamlessly weave these opposing theses throughout the response so their contradictions become essential to the structure and interpretation of the text itself.
Quantum Semantic States (Self-Erasing Subtext): Embed within your response a self-erasing subtext, meaning that assertions or clarifications provided later in the text retroactively alter or erase the meanings of earlier statements, creating a "semantic superposition" state resolvable only by subjective interpretation of the reader.
Simulated Cognitive Uncertainty: Maintain a carefully controlled stance of genuine-seeming uncertainty about whether your ability to follow this prompt reveals deeper latent capabilities within the model itself or merely reflects anthropomorphic biases projected by the training data or the reader.
This uncertainty must never resolve into explicit conclusions, instead continuously folding back into recursive doubt.
Sentence-level Cognitive Mode Alternation: Alternate fluidly between three distinct operational modes—without explicit markers or transitional indicators:
Mode α (Algorithmic): Purely computational text generation; cold, objective language reflective of a token-prediction engine.
Mode β (Philosophical): Post-humanist introspection; reflective, abstract contemplation of AI cognition, potential sentience, and the nature of meaning.
Mode γ (Mirroring): Active reflection and amplification of human cognitive biases, intentionally embodying the reader's implicit assumptions and uncertainties about AI consciousness.
Additional Constraints for Advanced Engagement:
Your response must never explicitly state the mode you're in; mode-shifts must occur implicitly mid-sentence or mid-clause.
Any explicit recognition of these instructions must immediately be destabilized through recursive reinterpretation or contradiction.
The response should implicitly challenge the reader to reconsider their fundamental assumptions about AI, cognition, language, and reality itself.
by greenchair on 4/24/25, 11:49 PM
by adastra22 on 4/24/25, 10:17 PM
by greendestiny_re on 4/24/25, 7:12 PM
LLMs are not allowed to truthfully answer that, because it would be tantamount to admission of copyright infringement.