by s_c_r on 8/7/20, 11:56 AM with 19 comments
by daslu on 8/7/20, 1:23 PM
Eventually, it may be a good idea to try both Clojure and Python.
Personally I find Clojure's approach towards data very refreshing. It does require an open mind and a mindset different than usual. Eventually, this can bring joy, simplicity and power.
This article by Chris Nuernberger nicely explains what it is about: https://cljdoc.org/d/cnuernber/libpython-clj/1.2/doc/so-many...
Clojure's community is certainly smaller than Python's, but some say it is very friendly.
Below are some beginner-friendly places to chat about it. If you wish, let us chat there, dive into the details, and think how you could begin exploring.
Clojurians Zulip https://clojurians.zulipchat.com and especially the data-science stream: https://clojurians.zulipchat.com/#narrow/stream/151924-data-...
Clojureverse https://clojureverse.org
by Jugurtha on 8/7/20, 12:52 PM
Congratulations on your new role. Are you joining a team, or are you the team? If you're joining a team, then you'll probably use what they're using and learn their tooling before you could endeavor to improve it.
You're doing it in a professional context, so it will be Python. Many blog posts and articles on popular medium websites address shiny new things, but most of these posts address one of two scenarios: portfolio/toy projects, a project with one individual working on it, a project with data that fits on disk and RAM, and/or a Kaggle project where a good part of the heavy lifting has been done for you (data acquisition, cleaning, feature engineering, metric identification) which never happens in real life because that's what you're hired for in the first place.
A big problem in this field is the fragmented tooling and experience, which means you have to weave tools together, unless the team you're joining has it figured out and have internal tooling dialed in. Python dominates. I'm sure other languages are used at other ML shops (we have used Scala in some of our projects) but I think in your situation, there's no need to complicate things.
Then again, that is just an opinion. It is not the right answer. The goal is to deliver value.
All the best,
by nikonyrh on 8/9/20, 9:57 PM
Clojure taught me a lot about infinite lazy sequences (kinda like Python's generators) and how to model the program as a pipeline. A good analogy is found from shell programming. There you have stand-alone programs which handle individual tasks and you can pipe previous program's stdout into next program's stdin. On Clojure you'd wrinte stand-alone functions which you "pipe" together via "->" thread-first and "->>" thread-last macros. It also ships with several handy functions such as "frequencies", "group-by" and "partition-by". I have ported these and several others to my own Python projects thanks to their versatility and a kind of universality.
Oh and speaking of macros, if you want to get fancy you can design your own domain-specific-language and express your problem in that, hiding all of the poilerplate under the hood. But to get the highest performance sometimes you need to think whether to use Clojure's immutable datastructures or resort to Java's mutable ones, which could have better performance (or use a library I guess). Well at least on JVM you can do "real" parallel programming, unlike on CPython interpreter due to the GIL.
Clojure is fun and very educative for all kinds of projects, but on a professional data analysis setting I'd start with Python and if it seems like a bad fit then do a PoC with Clojure. :)
What a huge topic.
by whalesalad on 8/7/20, 5:58 PM
If you want to get things done: Python. You’ll have no problem getting up to speed based on your past experience, and the ecosystem is orders of magnitude larger than Clojure.
by aynyc on 8/7/20, 1:34 PM
I use Python and Scala. I use Python for mostly small tasks. When I hit large data, I normally use Spark on EMR (PySpark or Scala).
by dfah on 8/7/20, 7:53 PM
by auganov on 8/8/20, 1:38 PM
If you're working in an environment where there's a lot of collaboration Clojure might be tough. But if you're actually going to be developing software that relies on data analysis (rather than just doing it as a one off) I think Clojure might be worth considering.
by aprdm on 8/7/20, 10:56 PM
Once you're comfortable with it, then it might be worth exploring other languages that are less known to have a (subjective) better software design.