from Hacker News

DeepSeek-Prover-V2

by meetpateltech on 4/30/25, 4:23 PM with 77 comments

by islewis on 4/30/25, 5:25 PM
> The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals
It feels pretty intuitive to me that the ability for an LLM to break a complex problem down into smaller, more easily solvable pieces will unlock the next level of complexity.
This pattern feels like a technique often taught to junior engineers- how to break up a multi-week project into bitesized tasks. This model is obviously math focused, but I see no reason why this wouldn't be incredibly powerful for code based problem solving.
by qoez on 4/30/25, 6:04 PM
The best part about these is that I know the weights are static so I know I won't have to deal with a sassy unusable update for a week suddenly.
by ekez on 4/30/25, 6:22 PM
I wonder if the authors have tried incorporating error feedback from Lean into their models.
Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.
[1]: https://arxiv.org/abs/2310.04353

by mcshicks on 4/30/25, 9:12 PM

You can run the model for free on openrouter.ai. I have played around with lean, slowly working my way through mathmatics in lean. I was stuck on this problem in section 3.6

theorem convergesTo_unique {s : ℕ → ℝ} {a b : ℝ} (sa : ConvergesTo s a) (sb : ConvergesTo s b) :

For fun I tried it on the free model on openrouter.ai. Got the answer the first time.

https://leanprover-community.github.io/mathematics_in_lean/m...

Here's the answer just to give you a feel.

  by_contra h
  have h₁ : a ≠ b := h
  have h₂ : |a - b| > 0 := by
    apply abs_pos.mpr
    exact sub_ne_zero.mpr h₁
  -- Use the definition of convergence to find N₁ and N₂
  have h₃ := sa (|a - b| / 2) (by linarith)
  have h₄ := sb (|a - b| / 2) (by linarith)
  cases' h₃ with N₁ h₃
  cases' h₄ with N₂ h₄
  -- Choose N to be the maximum of N₁ and N₂
  let N := max N₁ N₂
  have h₅ := h₃ N (by simp [N, le_max_left])
  have h₆ := h₄ N (by simp [N, le_max_right])
  -- Derive a contradiction using the triangle inequality
  have h₇ : |s N - a| < |a - b| / 2 := by simpa using h₅
  have h₈ : |s N - b| < |a - b| / 2 := by simpa using h₆
  have h₉ : |a - b| < |a - b| := by
    calc
      |a - b| = |a - s N + (s N - b)| := by ring_nf
      _ ≤ |a - s N| + |s N - b| := by
        apply abs_add
      _ = |s N - a| + |s N - b| := by
        rw [abs_sub_comm]
      _ < |a - b| / 2 + |a - b| / 2 := by
        linarith
      _ = |a - b| := by ring
  linarith

by simianwords on 4/30/25, 4:35 PM
related: I imagine in the future we might several "expert" LLM's and a wrapper can delegate tasks as needed as if it were a "tool". That way we can have segregation of expertise - each individual model can excel at one single thing.
A prover model might be used as a tool in the coming future.
by revskill on 4/30/25, 7:04 PM
The way intelligence works to me, is more about:
- Making correct and smart assumption. Currently all LLM bots are too stupid at making good assumptions. I don't want to explicitly repeat and repeat again my own assumptions while the context is clear enough. Hey bots, try harder.
- LLM bot needs to bring their own secondary and contextual memory in reasoning, i don't want to do it for you, ok ? You're the bot.
- Thinking out of the box. This is the final stage of intelligence. Adapt old technique to make your own technique to solve non-existing before problems.
by jasonjmcghee on 4/30/25, 5:31 PM
Super interesting that they chose 671B and 7B. no like 32B which feels like a "sweet spot"
by smusamashah on 4/30/25, 5:22 PM
That Putnam bench graph (middle one) is showing 49/658 solve rate.
> The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.
Which is 0.07% (edit: 7%) for PutnamBench
by Alifatisk on 5/1/25, 8:45 AM
Is this model hosted at Deepseek chat too? Couldn’t find it yesterday and I prefer not to selfhost because lack of good hardware.
by whatshisface on 4/30/25, 7:39 PM
How much education would a human need to perform at this level on the benchmarks?
by hartator on 5/1/25, 12:56 AM
Any way to install via ollama?
Like ollama run deepseek-ai/DeepSeek-Prover-V2-7B
by amelius on 4/30/25, 11:06 PM
Is someone working on similar ideas for compiler back-ends?
by Fokamul on 4/30/25, 8:14 PM
>"an open-source large language model"
Is it really opensource? Something changed?
by yahaya12 on 5/2/25, 7:22 AM
prediction for astonvilla vs fulham