from Hacker News

DeepSeek-Prover-V2

by meetpateltech on 4/30/25, 4:23 PM with 77 comments

  • by islewis on 4/30/25, 5:25 PM

    > The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals

    It feels pretty intuitive to me that the ability for an LLM to break a complex problem down into smaller, more easily solvable pieces will unlock the next level of complexity.

    This pattern feels like a technique often taught to junior engineers- how to break up a multi-week project into bitesized tasks. This model is obviously math focused, but I see no reason why this wouldn't be incredibly powerful for code based problem solving.

  • by qoez on 4/30/25, 6:04 PM

    The best part about these is that I know the weights are static so I know I won't have to deal with a sassy unusable update for a week suddenly.
  • by ekez on 4/30/25, 6:22 PM

    I wonder if the authors have tried incorporating error feedback from Lean into their models.

    Work from 2023 [1] showed general purpose models did better when they were able to incorporate error feedback, humans incorporate error feedback, but none of the SOTA models on minif2f seem to.

    [1]: https://arxiv.org/abs/2310.04353

  • by mcshicks on 4/30/25, 9:12 PM

    You can run the model for free on openrouter.ai. I have played around with lean, slowly working my way through mathmatics in lean. I was stuck on this problem in section 3.6

    theorem convergesTo_unique {s : ℕ → ℝ} {a b : ℝ} (sa : ConvergesTo s a) (sb : ConvergesTo s b) :

    For fun I tried it on the free model on openrouter.ai. Got the answer the first time.

    https://leanprover-community.github.io/mathematics_in_lean/m...

    Here's the answer just to give you a feel.

      by_contra h
      have h₁ : a ≠ b := h
      have h₂ : |a - b| > 0 := by
        apply abs_pos.mpr
        exact sub_ne_zero.mpr h₁
      -- Use the definition of convergence to find N₁ and N₂
      have h₃ := sa (|a - b| / 2) (by linarith)
      have h₄ := sb (|a - b| / 2) (by linarith)
      cases' h₃ with N₁ h₃
      cases' h₄ with N₂ h₄
      -- Choose N to be the maximum of N₁ and N₂
      let N := max N₁ N₂
      have h₅ := h₃ N (by simp [N, le_max_left])
      have h₆ := h₄ N (by simp [N, le_max_right])
      -- Derive a contradiction using the triangle inequality
      have h₇ : |s N - a| < |a - b| / 2 := by simpa using h₅
      have h₈ : |s N - b| < |a - b| / 2 := by simpa using h₆
      have h₉ : |a - b| < |a - b| := by
        calc
          |a - b| = |a - s N + (s N - b)| := by ring_nf
          _ ≤ |a - s N| + |s N - b| := by
            apply abs_add
          _ = |s N - a| + |s N - b| := by
            rw [abs_sub_comm]
          _ < |a - b| / 2 + |a - b| / 2 := by
            linarith
          _ = |a - b| := by ring
      linarith
  • by simianwords on 4/30/25, 4:35 PM

    related: I imagine in the future we might several "expert" LLM's and a wrapper can delegate tasks as needed as if it were a "tool". That way we can have segregation of expertise - each individual model can excel at one single thing.

    A prover model might be used as a tool in the coming future.

  • by revskill on 4/30/25, 7:04 PM

    The way intelligence works to me, is more about:

    - Making correct and smart assumption. Currently all LLM bots are too stupid at making good assumptions. I don't want to explicitly repeat and repeat again my own assumptions while the context is clear enough. Hey bots, try harder.

    - LLM bot needs to bring their own secondary and contextual memory in reasoning, i don't want to do it for you, ok ? You're the bot.

    - Thinking out of the box. This is the final stage of intelligence. Adapt old technique to make your own technique to solve non-existing before problems.

  • by jasonjmcghee on 4/30/25, 5:31 PM

    Super interesting that they chose 671B and 7B. no like 32B which feels like a "sweet spot"
  • by smusamashah on 4/30/25, 5:22 PM

    That Putnam bench graph (middle one) is showing 49/658 solve rate.

    > The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench.

    Which is 0.07% (edit: 7%) for PutnamBench

  • by Alifatisk on 5/1/25, 8:45 AM

    Is this model hosted at Deepseek chat too? Couldn’t find it yesterday and I prefer not to selfhost because lack of good hardware.
  • by whatshisface on 4/30/25, 7:39 PM

    How much education would a human need to perform at this level on the benchmarks?
  • by hartator on 5/1/25, 12:56 AM

    Any way to install via ollama?

    Like ollama run deepseek-ai/DeepSeek-Prover-V2-7B

  • by amelius on 4/30/25, 11:06 PM

    Is someone working on similar ideas for compiler back-ends?
  • by Fokamul on 4/30/25, 8:14 PM

    >"an open-source large language model"

    Is it really opensource? Something changed?

  • by yahaya12 on 5/2/25, 7:22 AM

    prediction for astonvilla vs fulham