by timsuchanek on 1/29/25, 4:21 AM with 1541 comments
by udev on 1/29/25, 4:25 AM
by Imnimo on 1/29/25, 5:13 PM
"DeepSeek trained on our outputs and that's not fair because those outputs are ours, and you shouldn't take other peoples' data!" This is obviously extremely silly, because that's exactly how OpenAI got all of its training data in the first place - by scraping other peoples' data off the internet.
"DeepSeek trained on our outputs, and so their claims of replicating o1-level performance from scratch are not really true" This is at least plausibly a valid claim. The DeepSeek R1 paper shows that distillation is really powerful (e.g. they show Llama models get a huge boost by finetuning on R1 outputs), and if it were the case that DeepSeek were using a bunch of o1 outputs to train their model, that would legitimately cast doubt on the narrative of training efficiency. But that's a separate question from whether it's somehow unethical to use OpenAI's data the same way OpenAI uses everyone else's data.
by bilekas on 1/29/25, 3:29 PM
Well I think DeepSeek releasing it open source and on an MIT license will rally the big talent. The open sourcing of a new technology has always driven progress in the past.
The last paragraph too is where OpenAi seems to be focusing their efforts..
> we engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models ..
> ... we are working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to take US technology.
So they'll go for getting DeepSeek banned like TikTok was now that a precedent has been set ?
by mrkpdl on 1/29/25, 10:12 PM
I don’t buy the argument that distilling from o1 undermines deep seek’s claims around expense at all. Just as open AI used the tools ‘available to them’ to train their models (eg everyone else’ data), r1 is using today’s tools.
Does open AI really have a moral or ethical high ground here?
by blast on 1/29/25, 6:01 PM
If Deepseek trained off OpenAI, then it wasn't trained from scratch for "pennies on the dollar" and isn't the Sputnik-like technical breakthrough that we've been hearing so much about. That's the news here. Or rather, the potential news, since we don't know if it's true yet.
by ok123456 on 1/29/25, 3:32 PM
The books were all in epub format, converted, cleaned to plain text, and hosted on a public data hoarder site.
by wanderingmoose on 1/29/25, 4:57 PM
A company like OpenAI can put whatever licensing they want in place. But that only matters if they can enforce it. The question is, can they enforce it against deepseek? Did deepseek do something illegal under the laws of their originating country?
I've had some limited exposure to media related licensing when releasing content in China and what is allowed is very different than what is permitted in the US.
The interesting part which points to innovation moving outside of the US is US companies are beholden to strict IP laws while many places in the world don't have such restrictions and will be able to utilize more data more easily.
by glenstein on 1/29/25, 4:13 PM
by daft_pink on 1/29/25, 3:53 PM
AI is probably similar where the Moore’s law and advancement will eventually allow people to run open models locally and bring down the cost of operation. Competiition will make it hard for all but one or two players to survive and Nvidia, OpenAI, Deepseek, etc most investments in AI by these large companies will fail to generate substantial wealth but maybe earn some sort of return or maybe not.
by me551ah on 1/29/25, 3:47 PM
OpenAI talks a lot about the principles of being Open, while still keeping their models closed and not fostering the open source community or sharing their research. Now when a company distills their models using perfectly allowed methods on the public internet, OpenAI wants to shut them down too?
High time OpenAI changes their name to ClosedAI
by readyplayernull on 1/29/25, 3:27 PM
https://www.wired.com/2011/02/bing-copies-google/
They don't care, T&C and copyright is void unless it affects them, others can go kick rocks. Not surprising they and OpenAI will do a legal battle over this.
by Ciantic on 1/29/25, 3:26 PM
by dragonwriter on 1/29/25, 6:47 PM
You might want to consider how it applies to this situation.
by mhitza on 1/29/25, 3:32 PM
1. Something I'd expect to happen.
2. Lived through a similar scenario in 2010 or so.
Early in my professional career I've worked for a media company that was scraping other sites (think Craigslist but for our local market) to republish the content on our competing website. I wasn't working on that specific project, but I did work on an integration on my teams project where the scraping team could post jobs on our platform directly. When others started scraping "our content" there were a couple of urgent all hands on deck meetings scheduled, with a high level of disbelief.
by 1970-01-01 on 1/29/25, 3:11 PM
by olalonde on 1/30/25, 2:40 AM
> We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.
> We will actively cooperate with other research and policy institutions; we seek to create a global community working together to address AGI’s global challenges.
/s, we all know what their true mission is...
by sho_hn on 1/29/25, 3:15 PM
by concerndc1tizen on 1/29/25, 4:51 PM
That would be a dangerous precedent to establish.
If it's a terms of service violation, I guess they're within their rights to terminate service, but what other recourse do they have?
Other than that, perhaps this is just rhetoric aimed at introducing restrictions in the US, to prevent access to foreign AI, to establish a national monopoly?
by thorum on 1/29/25, 3:39 PM
The humor/hypocrisy of the situation aside, it does seem to be true that OpenAI is consistently the one coming up with new ideas first (GPT 4, o1, 4o-style multimodality, voice chat, DALL-E, …) and then other companies reproduce their work, and get more credit because they actually publish the research.
Unfortunately for them it’s challenging to profit in the long term from being first in this space and the time it takes for each new idea to be reproduced is getting shorter.
by divbzero on 1/29/25, 8:08 PM
It is, of course, quite ironic for OpenAI to indiscriminately scrape the entire web and then complain about being scraped themselves.
[1]: https://searchengineland.com/google-bing-is-cheating-copying...
by esskay on 1/30/25, 12:04 AM
by bhouston on 1/29/25, 3:07 PM
And the US government is also likely itching to reduce the power of Chinese AI companies that could out compete US rivals (similar to the treatment of BYD, TikTok, solar panel manufacturers, network equipment manufacturers, etc), so expect sweeping legislation that blocks access to all Chinese AI endeavours to both the US and then soon US allies/West (via US pressure.)
The likely legislation will be on the surface justified both by security concerns and by intellectual property concerns, but ultimately it will be motivated by winning the economic competition between China and the US and it will attempt to tilt the balance via explicitly protectionist policies.
by oli5679 on 1/30/25, 3:17 PM
A. below is a list of OpenAI initial hires from Google. It's implausible to me that there wasn't quite significant transfer of Google IP
B. google published extensively, including the famous 'attention is all you need' paper, but open-ai despite its name, has not explained the breakthroughs that enabled O1. It has also switched from a charity to a for-profit company.
C. Now this company, with a group of smart, unknown machine learning engineers, presumably paid fractions of what OpenAI are published, has created a model far cheaper, and openly published the weights, many methodological insights, which will be used by OpenAI.
1. Ilya Sutskever – One of OpenAI’s co-founders and its former Chief Scientist. He previously worked at Google Brain, where he contributed to the development of deep learning models, including TensorFlow. 2. Jakub Pachocki – Formerly OpenAI’s Director of Research, he played a major role in the development of GPT-4. He had a background in AI research that overlapped with Google’s fields of interest. 3. John Schulman – Co-founder of OpenAI, he worked on reinforcement learning and helped develop Proximal Policy Optimization (PPO), a method used in training AI models. While not a direct Google hire, his work aligned with DeepMind’s research areas. 4. Jeffrey Wu – One of the key researchers involved in fine-tuning OpenAI’s models. He worked on reinforcement learning techniques similar to those developed at DeepMind. 5. Girish Sastry – Previously involved in OpenAI’s safety and alignment work, he had research experience that overlapped with Google’s AI safety initiatives.
by karim79 on 1/30/25, 1:43 AM
Just weeks later, I discovered that others were pulling off similar calculations! They were making great bread with ease and not having to resort to notebooks and calculators! The horror! I can't believe that said close friend of mine would actually share those highly hydraty mathematical formulas with other humans without first requesting my consent </sarc>.
Could it be, that this stuff just ends up in the dumpster of "sorry you can't patent math" or the like?
by osigurdson on 1/29/25, 3:09 PM
by TylerJaacks on 1/30/25, 2:16 AM
by nelblu on 1/29/25, 3:42 PM
by alasr on 1/30/25, 1:17 AM
> The San Francisco-based ChatGPT maker told the Financial Times it had seen some evidence of “distillation”, which it suspects to be from DeepSeek.
> ...
> OpenAI declined to comment further or provide details of its evidence. Its terms of service state users cannot “copy” any of its services or “use output to develop models that compete with OpenAI”.
OAI share the evidence with the public; or, accept the possibility that your case is not as strong as you're claiming here.
by kranke155 on 1/29/25, 11:42 PM
by glitchc on 1/29/25, 3:55 PM
by kamranjon on 1/30/25, 3:42 AM
The amount of iterations of training that would be needed for DeepSeek to actually learn anything from OpenAI would seem to be an insane amount of requests from a non-local AI, which you’d think would be immediately obvious to OpenAI just by looking at suspicious requests?
Am I correct in this assumption or am I missing something? Is it even realistic that something like this is possible without a local model?
by emsign on 1/30/25, 4:55 AM
by fimdomeio on 1/29/25, 9:27 PM
by duchenne on 1/29/25, 11:04 PM
by metaxz on 1/29/25, 6:34 PM
by nbgoodall on 1/29/25, 11:42 PM
by temporallobe on 1/29/25, 3:17 PM
by __MatrixMan__ on 1/29/25, 3:09 PM
by _moof on 1/29/25, 6:45 PM
by crowcroft on 1/29/25, 3:43 PM
Don't get mad about possible data theft, prove it in court.
by jgrall on 1/29/25, 6:22 PM
by karim79 on 1/30/25, 3:05 AM
I just can't even begin to imagine what will come of this riduculous techno-imperialism/AI arms-race, or whatever you want to call it. It should not be too hard for China to create their own ASICs which do the same, and finally be done with this palaver.
by InkCanon on 1/29/25, 3:18 PM
by whatshisface on 1/29/25, 3:05 PM
by cumulative00x on 1/29/25, 9:06 PM
by bicepjai on 1/30/25, 6:07 AM
by nba456_ on 1/29/25, 3:15 PM
by ForHackernews on 1/29/25, 11:00 AM
by hyperbovine on 1/29/25, 6:02 PM
by cratermoon on 1/29/25, 4:42 AM
by vinni2 on 1/29/25, 5:23 AM
by Kiro on 1/29/25, 3:20 PM
I'm not seeing it. I get it, the narrative that OpenAI is getting a taste of their own medicine is funny but this is not serious reporting.
by guybedo on 1/29/25, 6:25 PM
Everybody has evidence OpenAI scraped the internet at a global scale and used terabytes of data it didn't pay for. Newspapers, books, etc...
by sabhiram on 1/31/25, 6:45 AM
What was supposed to be a model, training, and data moat - is now reduced to operational cost, which they are not terribly efficient for.
OpenAI has been on a journey to burn as much $ as possible to get as far ahead on those three moats, to the point where decreasing TCO for them on inference was not even relevant - "who cares if you save me 20% of costs when I can raise on a 150b pre money value?".
Well, with their moats disappearing, they will have no choice but to compete on inference cost like everyone else.
by cbracketdash on 1/29/25, 5:19 PM
by JBSay on 1/29/25, 5:15 PM
by jchook on 1/29/25, 5:01 PM
China leads the world in the most cited papers[2]. The US's share of the top 1% highly cited articles (HCA) has declined significantly since 2016 (1.91 to 1.66%), and the same has doubled in China since 2011 (0.66 to 1.28%)[3].
China also leads the world in the number of generative AI patents[4].
1. https://www.bfna.org/digital-world/infographic-ai-research-a...
2. https://www.science.org/content/article/china-rises-first-pl...
3. https://ncses.nsf.gov/pubs/nsb202333/impact-of-published-res...
4. https://www.wipo.int/web-publications/patent-landscape-repor...
by waffletower on 1/29/25, 5:12 PM
by janalsncm on 1/30/25, 10:38 AM
It does matter whether it happened and how much it happened. Deepseek ran head to head comparisons against O1 so it would be pretty reasonable for them to have made API calls, for example.
But also, as the article notes, distillation, supervised fine tuning, and using LLM as a judge are all common techniques in research, which OpenAI knows very well.
by hugoromano on 1/29/25, 9:46 PM
by conartist6 on 1/29/25, 6:13 PM
OpenAI seems to feel that way about the their use of copyrighted material: since they didn't literally make a copy of the source material, it's totally fair game. It seems like this is the same argument that protects DeepSeek if indeed they did this. And why not, reading a lot of books from the library is a way to get smarter, and ostensibly the point of libraries
by adam_arthur on 1/29/25, 7:07 PM
They did the exact same thing with public information. Their model just synthesizes and puts out the same information in a slightly different form.
Next we should sue students for repeating the words of their teachers
by WD-42 on 1/29/25, 3:13 PM
by jongjong on 1/30/25, 12:08 AM
You can't have both ways... If OpenAI wants to claim that the AI is not repeating content but 'synthesizing it' in the same was as a human student would do... Then I think the same logic should extend to DeepSeek.
Now if OpenAI wants to claim that its own output is in fact copyright-protected, then it seems like it should owe royalty payments to everyone whose content was sourced upstream to build its own training set. Also, synthetic content which is derived from real content should also be factored in.
TBH, this could make a strong case for taxing AI. Like some kind of fee for human knowledge and distributed as UBI. The training data played a key part in this AI innovation.
As an open source coder, I know that my copyrighted code is being used by AI to help other people produce derived code and, by adapting it in this way, it's making my own code less relevant to some extent... In effect, it could be said that my code has been mixed in with the code of other open source developers and weaponized against us.
It feels like it could go either way TBH but there needs to be consistency.
by xinayder on 1/30/25, 9:12 AM
Well, this sounds like they are just crying because they are losing the race so far. Besides, DeepSeek explicitly states they did a study on distillation on ChatGPT, then OpenAI is like "oh see guys they used our models!!!!!"
by highfrequency on 1/29/25, 8:19 PM
by mbowcut2 on 1/29/25, 6:57 PM
by amarcheschi on 1/29/25, 1:04 PM
by elashri on 1/29/25, 3:19 PM
"We didn’t see them when they were stealing, we saw them when they were fighting over what was stolen"
That describes this situation. Although to be honest all this aggressive scraping is noticeable but for people who understand that which is not majority of people. but now everyone knows.
by 827a on 1/29/25, 3:10 PM
by ceejayoz on 1/29/25, 3:08 PM
by ranger_danger on 1/29/25, 4:37 PM
Obligatory "Everything is a Remix" https://www.youtube.com/watch?v=X9RYuvPCQUA
by pama on 1/29/25, 3:48 PM
[edit: the above comment was based on sensetionalist reporting in the original link and not the current FT article. I still think there is a lot of noise in these news this last week, but it may well be that openai has valid evidence of wrongdoing; I would guess that any such wrongdoing would apply directly to V3 rather than R1-zero, because o1 does not provide traces and generating synthetic thinking data with 4o may be counterproductive.]
by HarHarVeryFunny on 1/29/25, 4:23 PM
Of course it's possible that DeepSeek used O1 to generate some of this initial bootstrapping data, but not obvious. O1 anyways deliberately obfuscates it's reasoning process (see "Hiding the chains of thought" section of OpenAI's "Learning to reason with LLMs" page), such that what you see is an after-the-fact "summary" of what it actually did; so, if DeepSeek did indeed use some of O1's output to train on, it shows that the details of O1's own reasoning process isn't as important as they thought it was - it's just having some verified (i.e. leading to good outcome) reasoning data from any source that matters to get started.
by paul_e_warner on 1/30/25, 12:48 PM
When you take both of these points and put them back to back, a natural answer seems to suggest itself which I'm not sure the authors intended to imply: R1 attempted to use o1 to make its answers seem more human, and as a result it accidentally picked up most of it's reasoning capabilities in the process. Is my reading totally off?
by zhenghao1 on 1/31/25, 1:06 PM
by ks2048 on 1/29/25, 6:40 PM
But, I wonder - do companies like OpenAI, Google, and Anthropic use each others models for training? If not, is it because they don't want to or need to, or because they are afraid of breaking the ToC?
by oysmal on 1/30/25, 8:49 AM
by SubiculumCode on 1/29/25, 9:23 PM
by staticelf on 1/29/25, 3:30 PM
by whoknowsidont on 1/29/25, 11:20 PM
It simply doesn't matter. Their methodology works.
by ginkgotree on 1/29/25, 3:45 PM
by deeviant on 1/29/25, 5:01 PM
DeepSeek could simply admit, "Yep, oops, we did it," but argue that they only used the data to train Model X. So, if you want compensation, you can have all the revenue from Model X (which, conveniently, amounts to nothing).
Sure, they then used Model X to train Model Y, but would you really argue that the original copyright holders are entitled to all financial benefits derived from their work—especially when that benefit comes in the form of a model trained on their data without permission?
by thih9 on 1/30/25, 8:33 AM
I hope this is actually true and OpenAI loses its close to monopoly status. Having a for profit entity safeguarding a popular resource like this sounds miserable for everyone else.
At the moment AI looks like typical VC scheme: build something off someone else's work, sell it at cost at first, shove it down everyone's throats and when it's too late, hike the prices. I don't like that.
by nataliste on 1/29/25, 7:38 PM
The Wolf made off to a safe distance, and then said in a much injured tone:
"You have no right to take my property like that!"
The Lion looked back, but as the Wolf was too far away to be taught a lesson without too much inconvenience, he said:
"Your property? Did you buy it, or did the Shepherd make you a gift of it? Pray tell me, how did you get it?"
What is evil won is evil lost.
by JBits on 1/29/25, 3:24 PM
by caseyy on 1/30/25, 5:00 AM
If one could effectively patent software inventions, this would kill many industries, from video games (that all have mechanics of other games in them) to computing in general (fast algorithms, etc). Let’s hope no one gets ideas like that…
Granted, it would be ineffective in competing against China’s tech industry. But less effective laws have been lobbied through in the past.
by ripped_britches on 1/30/25, 12:59 AM
Also it’s pretty good confirmation that synthetic data is a valid answer to the data wall problem (non-problem).
by SilverBirch on 1/29/25, 3:27 PM
by davesque on 1/29/25, 8:17 PM
by zoba on 1/29/25, 3:44 PM
by MagicMoonlight on 1/30/25, 6:57 AM
by DidYaWipe on 1/29/25, 9:12 PM
Objection overruled.
by stevenally on 1/29/25, 5:30 PM
by kapad on 2/2/25, 9:39 AM
But somehow, it's a problem if another model developer distills it's model by training it on OpenAI?
IMO, if the first use is fair, then so is the second use.
by henry_viii on 1/29/25, 8:25 PM
by mkayle on 1/29/25, 11:16 PM
by jasoneckert on 1/29/25, 3:06 PM
by njx on 1/29/25, 3:20 PM
by ijidak on 1/29/25, 10:54 PM
Even if they win the legal case, it means weights can be inferred and improved upon simply by using the output that is also your core value add (e.g. the very output you need to sell to the world).
Their moat is about as strong as KFC's eleven herbs and spices. Maybe less...
by ripped_britches on 1/30/25, 12:56 AM
by gejose on 1/30/25, 6:13 PM
> "Well, Steve… I think it’s more like we both had this rich neighbour named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it."
by krystofee on 1/30/25, 8:29 AM
by wendyshu on 1/29/25, 9:55 PM
by elzbardico on 1/31/25, 12:04 PM
Is OpenAI going to complain about me too?
by top_sigrid on 1/29/25, 2:58 PM
by liendolucas on 1/29/25, 5:02 PM
by rcarmo on 1/29/25, 6:03 PM
by supermatt on 1/29/25, 3:38 PM
They don't specifically name OpenAI, but they refer to "directly prompting models to generate answers with reflection and verification".
by 65 on 1/29/25, 3:34 PM
by elzbardico on 1/31/25, 12:34 PM
The West is run by lawyers, MBAs and salesmen.
This kerfuffle is a delicious study about this.
by pknerd on 1/30/25, 7:16 AM
> OpenAI’s o1 costs $60 per million output tokens; DeepSeek R1 costs $2.19. This nearly 30x difference brought the trend of falling prices to the attention of many people.
From Andrew Ng's recent DeeplearningAI newsletter
by zb3 on 1/29/25, 6:09 PM
by alexfromapex on 1/31/25, 2:13 PM
by imchillyb on 1/29/25, 11:56 PM
If they did this, We the People would cover them like we do others. Without it, We the People don't care.
Cry, don't cry, it's meaningless to us.
by TrackerFF on 1/29/25, 6:05 PM
by EGreg on 1/29/25, 11:22 PM
Tell me again how come remixing our data is just dandy, many artists got disrupted — but no one should be able to disrupt OpenAI like that?
by 1propionyl on 1/29/25, 8:06 PM
For local tasks, the deepseek-r1:14b and deepseek-r1:32b distillations immediately replace most of that usage (prior local models were okay, but not consistently good enough). Once there's a "just works" setup for RAG on par with installing ollama (which I doubt is far of), I don't see much reason to continue paying for my subscription.
Sadly, like many others in this thread, I expect under the current administration to see self-hamstringing protectionism further degrade the US's likelihood of remaining a global powerhouse in this space. Betting the farm on the biggest first-mover who can't even keep up with competition, has weak to non-existent network effects (I can choose a different model or service with a dropdown, they're more or less fungible), has no technological moat and spent over a year pushing apocalyptic scenarios to drum up support for a regulatory moat...
...well it just doesn't seem like a great idea to me.
by halyconWays on 1/29/25, 8:02 PM
by deadbabe on 1/29/25, 4:43 PM
by leobg on 1/29/25, 11:03 PM
That is absurd.
Copyright law is designed to strike a balance between two issues. One the one hand, the creator’s personality that’s baked into the specific form of expression. And on the other hand, society’s interest in ideas being circulated, improved and combined for the common good.
OpenAI built on the shoulders of almost every person that wrote text on a website, authored a book, or shared a video online. Now others build on the shoulders of OpenAI. How should the former be legal but not the latter?
Can’t have it both ways, Sam.
(IAAL, for what it’s worth.)
by whenlambo on 1/29/25, 4:47 PM
> have you been trained using openai?
No, I have not been trained using technology or data from OpenAI. I am an artificial intelligence model developed by Yandex, called YandexGPT. My “training” is based on proprietary algorithms and data that was collected and processed by the Yandex team. While I have similarities with other models such as those developed by OpenAI (e.g. GPT-3), my training was conducted independently and uses unique approaches specific to Yandex. This includes using Russian-language data as well as other sources to provide a better understanding of context and provide useful information to users. If you have questions on topics related to AI technologies or anything else, I'd be happy to help!
by ryao on 1/30/25, 1:26 PM
by curtisblaine on 1/29/25, 4:23 PM
by myflash13 on 1/29/25, 3:42 PM
by WolfOliver on 1/30/25, 7:16 AM
So what is the point if you pay for it and can not use the results how you see fit?
by mkoubaa on 1/29/25, 10:02 PM
by nuc1e0n on 1/29/25, 8:32 PM
by jofzar on 1/30/25, 2:12 AM
by therealpygon on 1/30/25, 1:06 AM
by Digit-Al on 1/29/25, 6:42 PM
Cry me a river.
by aiono on 1/29/25, 3:44 PM
by feverzsj on 1/29/25, 4:02 PM
by rkagerer on 1/30/25, 5:44 AM
by ra7 on 1/29/25, 4:18 PM
by hedayet on 1/29/25, 7:50 PM
But in capitalism, we, the customers aren't going to focus on how models are trained or products are made; we only care about favourable pricing.
A key takeaway for me from this news is the clause in OpenAI's terms and conditions. I mistakenly believed that paying for OpenAI’s API granted full rights to the output, but it turns out we’re only buying specific rights (which is now another reason we're going to start exploring alternatives to OpenAI)
by windex on 1/29/25, 3:46 PM
by LZ_Khan on 1/29/25, 9:03 PM
OpenAI is like a team of fodder monkeys stepping on landmines right now, with the rest of the world waiting behind them.
by the_optimist on 1/29/25, 5:59 PM
by maxglute on 1/29/25, 10:07 PM
by buyucu on 1/29/25, 6:54 PM
They are now upset at a startup who is more loyal to OpenAI's original mission that OpenAI is today.
Please, give me a break.
by aDyslecticCrow on 1/29/25, 6:27 PM
by dlikren on 1/29/25, 4:02 PM
by curvaturearth on 1/29/25, 7:15 PM
by ingohelpinger on 1/30/25, 6:59 AM
by m3kw9 on 1/29/25, 7:43 PM
by mtlmtlmtlmtl on 1/29/25, 8:00 PM
If all they really have is some API requests, even assuming they're real and originated by Deepseek, that's very far from proof that any of it was used as training data. And honestly, short of commiting crimes against Deepseek(hacking), I'm not sure how they even could prove that at this point, from their side alone.
And what's even more certain is that a vague insistence that evidence exists, accompanied by a denial to shed any more light on the specifics, is about as informative as saying nothing at all. It's not like OpenAI and Microsoft have a habit of transparency and honesty in their communication with the public, as proven by an endless laundry list of dishonest and subversive behaviour.
In conclusion, I don't see why I should give this any more credence than I would a random anon on 4chan claiming a pizza place in Washington DC is the centre of a child sex trafficking ring.
P.S: And to be clear, I really don't care if it is true. If anything, I hope it is; it would be karmic justice at its finest.
by juliuskiesian on 1/31/25, 12:01 PM
by pluc on 1/29/25, 4:07 PM
by nachox999 on 1/29/25, 7:10 PM
by sylware on 1/29/25, 3:11 PM
by oxqbldpxo on 1/29/25, 4:01 PM
by beardedwizard on 1/29/25, 3:44 PM
by sirolimus on 1/29/25, 11:01 PM
by game_the0ry on 1/29/25, 5:38 PM
Ironic.
by asdefghyk on 1/29/25, 7:37 PM
Well who would have thought that?
by wnevets on 1/29/25, 5:34 PM
by colonelspace on 1/29/25, 6:05 PM
by sgammon on 1/29/25, 11:50 PM
by dbg31415 on 1/29/25, 11:30 PM
Back in college, a kid in my dorm had a huge MP3 collection. And he shared it out over the network, and people were all like, "Man, Patrick has an amazing MP3 collection!" And he spent hours and hours ripping CDs from everyone so all the music was available on our network.
Then I remember another kid coming in, with a bigger hard drive, and he just copied all of Patrick's MP3 collection and added a few more to it. Then ran the whole thing through iTunes to clean up names and add album covers. It was so cool!
And I remember Patrick complained, "He stole my MP3 collection!"
Anyway this story sums up how I feel about Sam Altman here. He's not Metalica, he's Patrick.
https://www.npr.org/2023/12/27/1221821750/new-york-times-sue...
by baggiponte on 1/29/25, 6:44 PM
by Jotalea on 1/29/25, 6:56 PM
by itamarcode on 1/29/25, 4:30 PM
by lawlessone on 1/29/25, 3:04 PM
by gostsamo on 1/29/25, 3:45 PM
by spacecadet on 1/29/25, 5:23 PM
So long HN and thanks for all the fish?
by low_tech_love on 1/31/25, 6:33 AM
It’s really interesting that the same people who defend liberal capitalism at its extreme and praise competition as its most important component (which I don’t disagree) are the same ones that’ll promptly attempt to destroy the system and the competition as soon as they are in such a position.
by geerlingguy on 1/29/25, 4:19 PM
by andy_ppp on 1/29/25, 3:42 PM
by HPsquared on 1/29/25, 6:24 PM
by glooglork on 1/30/25, 7:33 AM
by vrighter on 1/29/25, 9:38 AM
by beezlewax on 1/29/25, 11:18 PM
by moralestapia on 1/29/25, 7:09 PM
Got downvoted to oblivion by people who haven't been told what to think by MSM yet. Now it's on FT and everywhere, good, what matters is that truth comes out eventually.
I don't take any sides and think what DeepSeek did is fair play, however, what I do find harmful about this is, what incentive would company A have to spend billions training a new frontier model if all of that could be then reproduced by company B at a fraction of the cost?
by iimaginary on 1/30/25, 9:39 AM
by mercurialsolo on 1/30/25, 1:28 AM
by hsuduebc2 on 1/29/25, 4:56 PM
by rachofsunshine on 1/29/25, 3:37 PM
Yet another of a series of recent lessons in listening to people - particularly powerful people focused on PR - when they claim a neutral moral principle for what happens to be pragmatically convenient for them. A principle applied only when convenient is not a principle at all, it's just the skin of one stretched over what would otherwise be naked greed.
by insane_dreamer on 1/29/25, 3:52 PM
So we're going to reverse our goals to reduce emissions and fossil fuels in order to hopefully save future generations from the worst effects of climate change, in the name of being able to do what, exactly, that is actually benefiting humanity? Boost corporate profits by reducing labor?
by a2128 on 1/30/25, 2:10 AM
by belter on 1/29/25, 9:10 AM
by flybarrel on 1/29/25, 5:44 PM
by delusional on 1/29/25, 4:51 PM
Yes deepseek is going to steal all of your data. OpenAI would so the same. Yes the CCP is going to get access to your data and use it to decide if you get to visit or whatever. The white house does the same.
by kozikow on 1/30/25, 3:31 PM
So even if you put some "watermarks" in your AI generation, it's plausible defense to find publicly posted content with those watermarks.
Maybe it's explained in the article, but I can't access it, as it's paywalled.
by buildsjets on 1/29/25, 9:47 PM
by B1FF_PSUVM on 1/29/25, 7:36 PM
by pknerd on 1/30/25, 6:55 AM
by typon on 1/29/25, 3:15 PM
by xyst on 1/29/25, 11:57 PM
If they had a case, they wouldn’t be using FT. They would be filing a court case. Although that would open them up to discovery and the nasty shit ClosedAI has been up to would be game.
by ysofunny on 1/29/25, 10:06 PM
and this is good because any alternatives I can think of are older-school fighting
modern war is seeped in symbolism, but the contest is still there
e.g. whose dong is bigger? Xi Jingping's or Dnld Trump's
by almostdeadguy on 1/29/25, 10:43 PM
by freejazz on 1/31/25, 4:16 PM
by lxe on 1/29/25, 5:10 PM
by animitronix on 2/1/25, 2:13 AM
by rochak on 1/31/25, 3:55 AM
by daft_pink on 1/29/25, 3:36 PM
by udev4096 on 1/30/25, 2:08 PM
by schaefer on 1/29/25, 8:10 PM
Right?
by trkaky on 1/30/25, 11:38 AM
by asdfasdf1 on 1/29/25, 9:50 PM
by nshung on 1/29/25, 5:44 PM
by ddingus on 1/30/25, 12:04 AM
Seriously. Given how pretty much all this software was trained, who cares?
I, for one, don't and believe the massive amount of knowledge continues to be of value to many users.
And I find the thought of these models knowing some things they shouldn't very intriguing.
by semking on 1/29/25, 3:06 PM
ClosedAI scraped human content without asking and they explained why this was acceptable... but when the outputs of their training corpus is scraped, it is THEIR dataset and this is NOT acceptable!
Oh, the irony! :D
I shared a few screenshots of DeepSeek answering using ChatGPT's output in yesterday's article!
https://semking.com/deepseek-china-ai-model-breakthrough-sec...
by yapyap on 1/29/25, 10:30 PM
DeepSeek just bodied u bro, get back in the lab & create a better AI instead of all this news that isn’t gonna change them having a good AI
by zx10rse on 1/29/25, 9:03 PM
by vjerancrnjak on 1/30/25, 6:21 AM
Let’s race to the bottom.
by vcryan on 1/30/25, 2:43 AM
by jeanlucas on 1/29/25, 4:43 PM
by oatmeal_croc on 1/30/25, 6:32 AM
by paulhart on 1/29/25, 3:33 PM
by sonabinu on 1/29/25, 3:26 PM
by khazhoux on 1/30/25, 9:37 AM
by hello_computer on 1/30/25, 8:50 AM
by josefritzishere on 1/29/25, 6:32 PM
by nottorp on 1/29/25, 3:25 PM
It's okay when you steal worldwide IP to train your "AI".
It's not okay when said stolen IP is stolen from you?
If the chinese are guilty, then Altman's doom and gloom racket is as guilty or even more, considering they stole from everyone.
by fedeb95 on 1/30/25, 8:10 AM
by mk89 on 1/29/25, 3:57 PM
by asah on 1/29/25, 3:14 PM
by witnesser2 on 1/30/25, 8:22 PM
by pshirshov on 1/30/25, 1:48 AM
by coldpepper on 1/30/25, 1:39 AM
by thumbsup-_- on 1/29/25, 5:29 PM
by coolhand2120 on 1/29/25, 4:42 PM
> You > do you understand copredication?
> Assistant > deepseek-coder-33b-instruct > I'm sorry for any misunderstanding, but as an AI model developed by OpenAI, I don't have personal ...
Yikes! That's pretty damning IMO.
by octacat on 1/29/25, 5:06 PM
by pointedAt on 1/29/25, 4:42 PM
by 52-6F-62 on 1/29/25, 6:07 PM
Yesterday the industry crushed pianos and tools and bicycles and guitars and violins and paint supplies and replaced them with a tablet computer.
Tomorrow we can replace craven venture capitalists and overfed corporate bodies with incestuous LLM’s and call it all a day.
by seydor on 1/29/25, 3:20 PM
by exabrial on 1/30/25, 5:25 AM
by seanp2k2 on 1/30/25, 6:25 AM
by boxedemp on 1/29/25, 8:55 PM
by htrp on 1/29/25, 8:04 PM
by FpUser on 1/29/25, 10:38 PM
by otikik on 1/30/25, 9:01 AM
Oh wait I will ask DeepSeek instead.
by pcthrowaway on 1/29/25, 3:16 PM
by cratermoon on 1/30/25, 3:41 PM
by hsuduebc2 on 1/29/25, 4:56 PM
by anon115 on 1/30/25, 6:21 AM
by worik on 1/29/25, 6:39 PM
by this15testingg on 1/29/25, 4:17 PM
I'm sure Aaron Swartz would be proud of where the "tech" industry has gone. /s
what problem are these glorified AIM chatbots trying to solve? wealth extraction not happening fast enough?
by mring33621 on 1/29/25, 3:20 PM
by breakitmakeit on 1/29/25, 3:07 PM
The questions I am keenly waiting to observe the answer to (because surely Sam's words are lies): how hard is OpenAI willing to double down on their contradictory positions? What mental gymnastics will they use? What power will back them up, how, and how far will that go?
by tntxtnt on 1/29/25, 3:55 PM
by RohMin on 1/29/25, 3:23 PM
by TheJCDenton on 1/29/25, 3:48 PM
by aucisson_masque on 1/29/25, 11:03 PM
They call it IP theft yet when the New York Times sued OpenAI and Microsoft for copyright infringement they claimed it's fair use of data.