by acalderaro on 7/8/17, 12:43 AM with 52 comments
by yakult on 7/8/17, 1:44 AM
If any of it ever becomes commercially released or whatever, there'll need to be a complete rewrite that makes it usable and maintainable by people other than yourself. But most of the code will never get to that point because most of what you've done up until about a week ago is wrong and worthless, and the current, correct-until-next-week iteration is stuck together with duct tape.
Speed only matters on the infrequent hot paths, which is why Python is popular. The rule of thumb is nobody cares about speed / resource consumption until it needs to run on a cluster, but then you care a lot because cluster time is metered and simulations can get huge. Fortran is still fairly popular because many math libraries are on it and porting would require huge effort from a very small group of very busy people.
Most of the coders are not software engineers and don't know / don't follow best practices; on the other hand the popular best practices are not designed for their use-case and frequently don't fit. Versioning (of the I-don't-know-which-of-the-fifty-copies-on-my-laptop-is-the-right-one type) is a big issue. Data loss happens. Git/Github/etc has steep learning curve, but so does all the various workflow systems designed for research use.
by scott_s on 7/8/17, 1:20 AM
In production software, this is flipped. Every feature claim needs to have an associated test, as it's a contract with your user. But when it comes to performance, everyone just waves their hands.
I'm being a little glib. But production software has to work. You'll spend far more time dealing with all of the "less interesting" details and edge cases than with research software. As ams6110 points out, this means more focus on testing, maintenance and good design. But I do want to emphasize testing - sometimes you'll spend more time testing something than actually implementing it. There's also often many more residual effects from dependencies elsewhere in the ecosystem you're working in. That's not typical in academic software.
by notacoward on 7/8/17, 2:00 AM
Ironically, an academic might get to spend a higher percentage of their time on pure coding than a professional coder does. They have other concerns. Maintainable code is not part of the desired outcome. It's consumable and expendable, not durable, so any time spent making it any better than "just barely good enough" is wasted. Why build a tank when all you need is a bicycle?
[1] At least the expectation. Some academic code lives on far longer than its authors intended, and some non-academic code vanishes pretty darn quickly. But in general, both the intent and the expectation is that non-academic code will live longer.
by throwaway-emc2 on 7/8/17, 3:29 AM
This is the biggest difference between academic and professional programming in a single pithy statement, from a paper that Knuth wrote.
by ams6110 on 7/8/17, 1:05 AM
by tytso on 7/8/17, 3:42 AM
A few years back, some of the researchers (professors and graduate students) claimed they were interested in more testing and possibly taking some of their work (Betrfs[1], specifically), and productionalizing it. In response, I spent a lot of time with kvm-xfstests[2] and gce-xfstests[3][4] testing infrastructure, cleaning them, making them work in a turn-key fashion, and providing lots of documentation.
[2] https://github.com/tytso/xfstests-bld/blob/master/Documentat...
[3] https://github.com/tytso/xfstests-bld/blob/master/Documentat...
[4] https://thunk.org/gce-xfstests
Not a single researcher has used this code, despite the fact that I made it so easy that even a professor could use it. :-)
The problem is that trying to test and productionalize research code takes time away from the primary output of Academia, which is (a) graduating Ph.D. students, and (b) writing more papers, lest the professors perish. (Especially for those professors who have not yet received tenure.) So while academics might claim that they are interested in testing and trying to get their fruits of the research into production code, the reality is that the Darwinian nature of life in academia very much militates against this aspiration becoming a reality.
It turns out that writing a new file system really isn't that hard. It's taking the file system, testing it, finding all of the edge cases, optimizing it, making it scale to 32+ CPU's, and other such tasks to turn it into a production-ready system which takes a long time. If you take a look at how long it's taken for btrfs to become stable it's a good example of that fact. Sun worked on ZFS for seven years before they started talking about it externally, and then it was probably another 3 years before system administrators started trusting it with their production systems.
by coherentpony on 7/8/17, 3:35 AM
Professional coders are paid to code.
by austincheney on 7/8/17, 2:56 AM
Academia isn't preparing developers for this reality. Many will try to fake it or hide under imposter syndrome, which is fine if everybody in the company is an imposter, otherwise it is plainly obvious you are incompetent.
by joeclark77 on 7/10/17, 5:08 PM
If you are talking about computer science academics, of course, that's a horse of a different color. In that case, the code is the topic, so I would guess that they're providing it! On the other hand, the majority of such research is probably solving niche problems and special cases, so it may not be very usable in your professional coding.
by snovv_crash on 7/8/17, 7:32 AM
In contrast, industry doesn't let you choose the problem: you need to solve whatever the problem is that the client has. This means generalising a lot further and having a less optimal solution that is more robust to input error or poorly calibrated measurements. Even if it does fail you should be able to identify why and explain to the user what they did wrong.
In academia this feedback process is generally to the person who wrote the software, so a cryptic error message including some algorithmic details might be sufficient to debug the inputs sufficiently.
by rglullis on 7/8/17, 8:15 AM
by hprotagonist on 7/8/17, 1:33 AM
This informs my design choices quite a bit.
by bastijn on 7/8/17, 6:33 AM
On a more serious note. In addition to what is already mentioned by others on quality, performance and so on I'd like to add that in professional career you most likely work with a (larger) team. Which means you will run into code conflicts where code is reused for different purposes and you cannot simply change it. In addition you have to think about readability and documentation as your colleagues have to be able to understand the code without losing too much time or needing you.
You will also always have to work with legacy code. Most likely code you want to change but can't considering the timelines.
You will have to sync your design with many others. You might have to convince them or discuss issues with conflicting requirements or deadlines. There will be times you can't finish your entire design and have to think of a staged introduction or even harder, change it so it can work with only 50% of the design.
Also, your code has to run for many years. You can't simply take an expirimental third party package maintained by a single person. Too risky. You have to think about hardware expiring or no longer being supported (especially with gpus).
You gave to think about licenses. Academia is usually free. With professional you have to take a close look.
by cbanek on 7/8/17, 3:31 AM
by dasmoth on 7/8/17, 9:29 AM
Also, the focus on building software in teams seems to lead to architectures that need teams (vs. suites of manageable-size, "do one thing well" tools).
Slightly different take on this: http://yosefk.com/blog/why-bad-scientific-code-beats-code-fo...
by pgbovine on 7/8/17, 2:11 AM
by stared on 7/8/17, 7:55 AM
by guscost on 7/8/17, 1:20 AM
by notadoc on 7/8/17, 1:23 AM
Professional is often whatever works
This is fairly common with many academic vs professional differences, btw
by sonabinu on 7/8/17, 3:13 AM
by santaclaus on 7/8/17, 3:25 AM
by ramgorur on 7/8/17, 4:36 AM
by booleandilemma on 7/8/17, 4:04 AM
by kleer001 on 7/8/17, 1:17 AM
by lapinrigolo on 7/8/17, 9:26 AM
by kobeya on 7/8/17, 6:58 AM
by tudorw on 7/8/17, 7:15 AM
by Radim on 7/8/17, 4:24 AM
https://news.ycombinator.com/item?id=14692691
Copy&pasting my response there:
---
Why is code coming out of research labs/universities so bad?
1. DON'T SEE WHY CLEAR CODE MATTERS
Academic projects are typically one-offs, not grounded in a wider context or value chain. Even if the researcher would like to build something long-term useful and robust, they don't have the requisite domain knowledge to go that deep. The problems are more isolated, there's little feedback from other people using your output.
2. DON'T WANT TO WRITE CLEAR CODE
Different incentives between academic research (publications count, citation count...) and industry (code maintainability, modularity, robustness, handling corner cases, performance...). Sometimes direct opposites (fear of being scooped if research too clear and accessible).
3. DON'T KNOW HOW TO WRITE CLEAR CODE
Lack of programming experience. Choosing the right abstraction boundaries and expressing them clearly and succinctly in code is HARD. Code invariants, dependencies, comments, naming things properly...
But it's a skill like any other. Many professional researchers never participated in an industrial project, so they don't know the tools, how to share or collaborate (git, SSH, code dissemination...), so they haven't built that muscle.
The GOOD NEWS is, contrary to popular opinion, it doesn't cost any more time to write good code than bad code (even for a one-off code base). It's just a matter of discipline and experience, and choosing your battles.