by OmegaHN on 8/20/12, 4:03 AM with 109 comments
Java seems to fit this role very well. It is statically typed, object-oriented, and doesn't delve into memory. However, it seems to get a lot of hate (or, at least, dismissal) from many programming communities, so I am asking, why not Java? Why is it so horrible as a systems language above C? Is there any other language that fits this role in a better way?
I am in particular asking this because I have been banging my head against the Python syntax for awhile, but I am trying to expand what languages I can program in.
by strlen on 8/20/12, 4:47 AM
The hate against Java comes from using Java for application development: this is largely due to the kinds of applications that are typically written in Java (line of business software) and (this is the most important reason) accidental complexity and low quality of APIs like Spring or J2EE.
Recipe for programming happyness is to use the right tool for the job:
* Python (or Ruby) for web application development, development tools, and "devops" scripting.
* C (or C++) for pieces that need deterministic performance[1], provide a "native" feeling user interface, or require control over memory layout.
Note: performance and efficiency are relative to what your throughput and latency requirements are. Google's crawlers and indexers will remain in C++ for the foreseeable future, but (for example) crawlers for an intranet can get away with being in Java (or Python for that matter).
* Java (or Scala, Haskell, OCaml, Go, Erlang, or one of the many Lisps) for "userland" systems programming. If the majority of the system fits under the last bullet point, use C++.
* Avoid JNI or Swig if you can. Use JSON + REST for cross-language RPC. If you need performance guarantees of a tight binary protocol use Thrift or Protocol Buffers. If you have to use JNI, consider using JNA first.
* No matter what language you use, stick to high quality libraries and tools. For Java, you'll absolutely want to use guava, Guice, and either Netty (or NIO.2 if you are using Java 7) or Jetty + Jersey + Jackson (for REST APIs).
Pick up either emacs and cscope, netbeans, Eclipse, or IntelliJ for navigating a large Java codebase.
All Java build tools suck. Maven sucks less and is the de-facto standard in the open source community. Twitter's "pants" is also worth looking at.
* Don't touch Spring with a 60-foot pole: in the mildest terms it's unequivocal and absolute garbage. Ditto for any other buzzword you may see in a job listing for an "enterprise" Java development job (with 20 years of experience required, naturally).
[1] Java performance can be quite high, but a JIT-ted and garbage collected runtime implies a lack of determinism.
by gojomo on 8/20/12, 4:47 AM
But, Java's a bit verbose, has gaps in concise support for higher-level constructs, and sometimes the static typing gets in the way. So if you don't find those parts helpful -- some do -- and think your performance targets can be met with other later optimizations/design-choices/selective-reimplementations, stick with whatever more concise language you're good at.
Or, use any of the more concise languages available on the JVM allowing intermixing of the occasional Java facility, like Jython, JRuby, Groovy, Javascript, Scala, Clojure, and others.
(If efficiently handling massive numbers of concurrent net/IO streams is a priority, the recent JVM-based project vert.x may be of interest. I haven't used it for anything but toy tests, but it seems to combine some of the best-practices for maximum JVM IO throughput with a somewhat higher-level-language-agnostic top layer well-suited for servers/proxies/crawlers.)
by Derbasti on 8/20/12, 5:51 AM
This probably is a consequence of the verbosity of Java-the-language, which made heavy tooling support a necessity. And then Eclipse, which provides one of the tightest language integration with Java of any IDE ever.
The sad thing is that this is not really the fault of Java-the-language or Eclipse. It did spawn a whole caste of very mediocre programmers and libraries though, which can make for a very unpleasant culture.
Used correctly, Java can be a great tool, though.
by slurgfest on 8/20/12, 5:31 AM
If you want to use Java (e.g.: you know it already and don't like learning other things), who cares? Why is this an issue where you have to challenge other people's opinions of Java? Use it if you want to.
by rbanffy on 8/20/12, 4:22 AM
Another approach could be Jython (or any other JVM language closer to the desired level of abstraction) and Java.
I don't have much love for Java the language. It's not much easier to program than with C, isn't faster and is very verbose. Still, what you are doing looks like a good match for it. And all the respect I don't have for the language, I have for the JVM.
I wouldn't use if for web app development as there are much more productive options around.
by rockyj on 8/20/12, 5:01 AM
One can write concurrent systems in Java without understanding concurrency. Languages like Scala and Clojure will give you some freedom but will also enforce certain design principles which will save you.
Similarly for web development, there are scores of frameworks in the Java world, and you can mess it up easily. Rails / Django on the other hand will provide one good, solid way to do web programming.
Finally, Java is showing it's age. The need to write large files of XML to configure things and the lack of ability to treat functions as objects put developers off. Some things are being addressed by Oracle but will take time.
by btilly on 8/20/12, 5:17 AM
Seriously, there is a fairly direct translation from any Java you might want to write to completely equivalent Python. Sure, Python offers more complex techniques such as list comprehensions and iterators. But you don't need to use them. You can just write Java-like Python.
by pacala on 8/20/12, 5:21 AM
* First class functions (interfaces with one method) plus garbage collector eventually encourage a functional programming style, with lots of little objects created on the heap. Alas, the per-object memory overhead of popular Java implementations is horrendous.
* Strong emphasis on using threads for concurrency. Alas, in practice, threads are incredibly large memory hogs.
* Verbosity. While it is possible to write clean composable code in Java, it is also remarkably verbose. After a while, this gets old and people take all shortcuts they can to limit verbosity. Which is a very bad idea. To quote an esteemed colleague, "I never took a shortcut I didn't regretted it later". Can we have our lambdas yet, pretty please?
by phao on 8/20/12, 7:06 AM
Notice, though, that competent people have done great jobs using these languages. So you have some choices. Two of them are: wonder why people bash Java or go do something useful with it. I suggest you do the second.
The key to using programming languages is in trying to use the one which will help you the most, or get in your way the least. Sort of "the right tool for the job". Idk what jobs java is good at. If you found out that it's good for your project, then use it.
Take a look a this article: http://prog21.dadgum.com/143.html
by orangecat on 8/20/12, 4:52 AM
It's not "horrible", it just has many slight-to-moderate deficiencies and annoyances that make development more work than it should be.
Is there any other language that fits this role in a better way?
Scala is strictly superior when used as a "better Java". (If you go deep into its functional capabilities you get a different set of tradeoffs). C# is better as a language, but then you're tied to .NET.
Really we'd need to know more details of what you're doing and why you believe Python may not work. Are you concerned about performance, or do you need to do things that Python doesn't have convenient APIs for?
by nostromo on 8/20/12, 5:44 AM
I think most people on HN who hate Java are talking about creating websites, and for good reason. Back in the bad ol' days, people would use Java frameworks like Struts for web apps, and it was quite painful.
For my latest project I'm using Play Framework for front-end Java, and it's quite delightful.
by freeslave on 8/20/12, 4:55 AM
by samspot on 8/20/12, 5:26 AM
The best reason to AVOID using Java is the huge demand for Java programmers and the low supply. At my job we can barely find applicants with Java so we end up hiring .NET people and converting them.
by bbayer on 8/20/12, 8:02 AM
Python is very powerful in terms of string manipulation because it has very good language constructs (like slice syntax) which makes development easy. At the beginning it might be a little bit confusing but once you mastered it you really feel power.
Twisted like frameworks also makes good job at this point. It is well-designed, asynchronus and it suits well for multi-tier network applications.
by jfb on 8/20/12, 5:00 AM
by Mikera on 8/20/12, 7:07 AM
You can safely ignore the people who bash Java - they are generally clueless. The Java language is perfectly fine: high performance, statically typed, OOP, relatively simple and maintainable. It may not offer the most concise code and it may not have all the "trendy" language syntax features but guess what - that actually doesn't matter much in the real world (i.e. outside the realm of language designers and fanboys). If saving a few characters of typing is your major concern when choosing a language, you have much bigger problems.
But the real strength in Java is not the language but rather the overall platform - the combination of the JVM (which is an amazing high performance feat of engineering), the library ecosystem (which is the best overall for any language), the tools (great IDEs, Maven, a host of other developer-focused tools), the fact that the OpenJDK itself and most of the libraries are open source and the portability (compiled JVM code is extremely portable, and importantly doesn't need a recompile unlike some other so-called "cross-platform" languages)
So overall you can't really go wrong with choosing Java for server side applications. Although I would also give Clojure or Scala a look - if you are after "powerful" languages then these two are pretty amazing and you still get all the benefits of being on the Java platform.
by jvvlimme on 8/20/12, 2:55 PM
That being said, it doesn't really matter what language you write your crawler in: its performance will much sooner be influenced by other aspects (network latency, storage, etc) than the language you choose.
So pick the language you're most comfortable with for crawling and offload the data processing to a lower level language that is better sooted for that task.
by Tichy on 8/20/12, 7:45 AM
webInfo = {url: "bla.bla", title: "bla die blub", links: ["link1", "link2"]}
Notice that webInfo contains two different types, Strings and Arrays. In Java arrays or hashes you can not easily mix types - you'll end up just putting objects everywhere, then be forced to litter the code with type casts. Or you create the unwieldly class hierarchy. That is my prediction, anyway - I am too lazy to come up with a good example :-(
You can also not simply write something like the hash above. The nearest you can get is if you have created that class hierarchy with suitable constructors, you could instantiate that in one go. At least that is my memory - I have now avoided it for so long that I am not even sure how to instantiate an Array or a Hash with data on the fly anymore.
I think instantiating an array with data goes something like
links = new String[]{"bla", "blub"}, and there is nothing like that for Hashes - you are stuck with
info = new HashMap()<String, Object>;//generics are particularly ugly and annoying
info.put("links", new String[]{"bla", "blub"});
info.put("title", "some stupid web site");
info.put("url", "undisclosed");
And so on - a far cry from the example above. (Note the Java syntax is probably wrong, created from memory - but it is something like that).
Even if you went through the mind numbing work of creating appropriate classes, you'd be stuck with
info = new WebInfo(title, url, new String[]{link1, link2,...});
And that is just for two different types, and notice that there is no way to see what the name of the parameters of the WebInfo constructor actually are from that snippet of code.
title: someTitle
is actually much more readable because you can instantly see that someTitle is supposed to be a title.
Also if you want to use NoSQL, I suspect converting java classes to JSON could be a pita, too.
by ljw1001 on 8/20/12, 12:32 PM
Unless you're building something that needs to be (1) highly dynamic (like a web-based spreadsheet where you don't know the column types til run-time, or (2) true real-time software, you're probably better off using java. Some libraries do suck as others wrote, but it's the volume of good libraries you care about. In any case, I'd argue that in many alternate languages, the code you're writing so quickly doesn't need to be written at all in java, because there's a library for it.
Verboseness is a fact in Java, but a decent IDE shields you from that as well. With Java it takes a little longer to get things done, but (in my experience) you spend less time trying on performance, fixing problems in the underlying tools or language, or just dealing with your own bugs and keeping things running. Since most development is maintenance, you want to optimize for that.
by mseepgood on 8/20/12, 4:58 AM
Go?
by NTH on 8/20/12, 5:28 AM
You should probably check existing web crawler solutions to see if you can adapt them before rolling your own.
by lelele on 8/22/12, 8:26 PM
We may say that with the current crop of languages running on the JVM, Java is a low-level language. It is to the JVM what C is to hardware. You avoid coding in both when you have higher-level languages available which will make you more productive.
But when you want to optimize performance on the JVM for specific chunks of your application - without resorting to JVM bytecode of course - Java is the right choice.
by exelib on 8/20/12, 5:23 AM
by dotborg2 on 8/20/12, 2:01 PM
In such case like web crawler, the main issue with Java is the scalability or rather lack of it. You need to code it yourself, but that's not any different than other languages and platforms.
by spullara on 8/20/12, 5:17 AM