from Hacker News

What is going on with measures of programming language popularity?

by trapatsas on 9/30/18, 6:20 PM with 86 comments

  • by dzdt on 10/1/18, 3:52 AM

    The article omits the Redmonk ranking [1], which is imho much better than TIOBE. The Redmonk approach uses github and stackexchange as complementary data sources, and shows a two dimensional main result.

    The top 20 in the Redmonk ranking are

      1 JavaScript
      2 Java
      3 Python
      4 PHP
      5 C#
      6 C++
      7 CSS
      8 Ruby
      9 C
      9 Objective-C
      11 Swift
      12 Scala
      12 Shell
      14 Go
      14 R
      16 TypeScript
      17 PowerShell
      18 Perl
      19 Haskell
      20 Lua
    
    Considering how different languages occupy different niches, the relative ranking withing niches seems reasonable. And ranking across niches scales according to size of the niche.

    Client-side web scripting is a huge niche, hence javascript is in the top spot. General purpose desktop/server applications are another huge niche, served by Java, C#, C++, C, Scala, Go (in that order). And so on.

    It seems pretty sane, and the results have been stable enough to interpret the rise of upcoming languages and see the resilience of top contenders.

    [1] https://redmonk.com/sogrady/2018/08/10/language-rankings-6-1...

  • by munificent on 10/1/18, 8:52 PM

    The TIOBE index is hot garbage, and always has been.

    Even so, this article is also pretty bad. Nowhere does the author define what he means by "popularity", which is one of the key problems all of these popularity contests suffer.

    If we say "Language X is more popular than language Y?" What does that mean? Answers could be any of:

    * Total extant corpus of X is larger than Y.

    * Number of people who know X is larger than Y.

    * Amount of open source code in X is greater than Y.

    * Number of people currently writing X (how much?) is greater than Y.

    * Number of people who want to be writing X is greater than Y.

    * Number of jobs available for writing X is greater than Y.

    * Number of people talking about X is greater than Y.

    These are all wildly different metrics but all have reasonable claims to represent "popularity" and/or are what some of these rankings claim to show.

    To do anything useful, you really need to know what problem the reader is trying to solve and pick a metric that helps that problem. Is the reader trying to decide what language to learn to find a job today? To get ahead of the curve and be an expert in five years? To discover a new exciting language? To choose a language to use for a large, conservative project? A small ambitious one?

  • by yen223 on 10/1/18, 2:55 AM

    What's going on is that there are many equally-valid definitions of "popularity". Do you mean number of developers who's heard of a language? Who uses a language? Who likes a language? Or do you mean a language that's most widely deployed, for some definition of deployed?

    The question you need to ask is what decision is going to be impacted as the result of knowing the "popularity" of some language. The answer to that question will guide you towards the right metric you should be looking at.

  • by thebooglebooski on 10/1/18, 7:09 PM

    I don't really understand how you can measure popularity without doing a census-like survey, reaching out to every developer possible in a coordinated effort.

    Awhile back, I took a statistics class that explained how polling everyone by calling everyone with a phone landline resulted in a misrepresented sample.

    The reason: not everyone has a phone landline. And those with phone landlines had a tendency to belong to specific demographics.

    So...if you survey everyone using Github, and not every developer uses Github, are you not prone to the same fallacy?

  • by observr9 on 10/1/18, 7:52 PM

    How about available jobs? I suspect a lot of people are trying to determine which language to invest their time in learning.

    Another relevant measure would be ALL jobs, not just available. Some people may not care if a million people are using language X if they're not getting paid for it.

  • by drawkbox on 10/1/18, 7:20 PM

    > TIOBE measures the sheer quantity of search engine hits. PYPL measures how often language tutorials are Googled.

    Popularity by who searches for language help like Java or C/C++ is mostly because those are more difficult languages with bigger, historical sprawl and frameworks.

    Just using searching for popularity is skewed because people search more so for things they don't know, than ones they do, and harder languages will probably be more searched. Same with tutorials, it only shows what people are learning. However it is somewhat valid in that even languages you use daily you end up searching for solutions and information on docs, community, etc.

    The github and repo stats probably give a bigger picture of popularity combined with searching and surveys.

    Side note: Personally I think everyone should learn C/C++ and maybe a functional language or a dynamic language like Python in addition to their main languages. Learning C/C++ and building in it is closer to the metal, has memory management and every language you learn after is less difficult. I do love C++ for game development but it helps with learning all other languages as well as all other languages difficulty is downhill from it, and is especially great for memory management, stack/heap understanding, value/reference understanding and C/C++ is empowering in the power/speed of the platforms. To this day still most apps are built with C/C++ under the hood whether directly, exported to or in a virtual machine that is built in it. For highly performant apps/code/systems C/C++ still are king.

  • by bacon_waffle on 10/1/18, 9:21 PM

    My day job mostly involves C code, which lives in a company repository; I only rarely search for C specific stuff. Hobby projects - work on OSS in C++/Python, or learning Rust - must result in an order of magnitude more of my language-specific searching and public committing, even though it represents a small fraction of my programming time.

    So, for me, searching is mainly related to the learning about a language, not so much using a language. This makes me wonder if the integral over time of search volumes might be a useful measure of language use?

  • by Apocryphon on 10/1/18, 7:57 PM

    Article underestimates how much iOS app legacy Objective-C code there is; even Apple is a far ways off from replacing it all with Swift.
  • by hirundo on 10/1/18, 7:02 PM

    As a Rubyist I'm pleased with the Github ranking, but how much of that is tilted by Github being a Ruby platform initially colonized by Rubyists?
  • by innocentoldguy on 10/1/18, 8:57 PM

    I don't put a lot of stock into TIOBE and similar lists because I'm not really sure how that information helps. It seems to me that these lists are measuring ubiquity rather than popularity, which I define as being well-liked. I guess there is value in that, but I'm far more interested in seeing a ranked list of well-liked languages and the reasoning behind those opinions. Slant isn't perfect, but it offers me the sort of pros vs. cons lists that I find much more valuable when selecting a language.

    Job prospects/salary is another ranking I'm interested in. For example, I know I can get a JavaScript job on any corner, but is that ubiquity going to cost me in salary? Can I earn more by mastering something like Rust, Elixir, or Erlang?

  • by davidw on 10/1/18, 10:14 PM

    I used to do a language popularity thing, langpop.

    One of the takeaways is that there are different metrics and none of them is perfect.

    So I grabbed what I could, combined them, and even had a JS thing where you could weight them differently.

    Stuff I looked at: raw search, source code, books, job advertisements.

  • by analog31 on 10/1/18, 7:27 PM

    I've used the language popularity indexes to show MBA's that my preferred tool chain isn't some weird obscure thing that I just discovered on the Internet. The numbers were good enough for that purpose, and of course they didn't question the underlying methodology. Perhaps the more useful message for them is that there are in fact more than one or two popular languages, and that they are not all commercial products.
  • by zmmmmm on 10/2/18, 1:43 AM

    The big problem is all the major indexes are driven by search engine measures, and those themselves are highly sensitive to any change in the underlying search engine methodology itself. So a small change in Google's ranking algorithm will see a language move up or down dozens of places.

    I would tend to focus more on job ads personally because those are at least linked to "real intent" to use a language for something tangible. You don't put it in an ad because something is controversial or had a lot of news lately etc. which can all lead to things being Googled a lot or getting a spurt of search activity.

    Github's index is nice in that it is based on actual code, but then it's also heavily biased by what is open source and therefore doesn't fully reflect industry use of languages (which is why I think it deviates from other indexes to put Javascript and Python a bit higher, and less emphasis on say Java and C#).

  • by DoreenMichele on 10/1/18, 10:20 PM

    There is an excellent book on such things that I highly recommend: How to lie with statistics

    Problems like this are part of why we have sayings like:

    Measure twice, cut once.

    GIGO (Garbage in, garbage out)

    Some saying about measuring to extreme precision, then "cutting with an axe."

  • by 13415 on 10/1/18, 9:37 PM

    Instead of popularity indices, I'd like to see the percentage that a language+framework takes in applications of a certain type, such as desktop applications, client-side web applications, server applications, data crunching, embedded, etc. That way you can spot trends or find alternative languages for a certain task.

    Another interesting metric would be the number of open source libraries for the language that have had commits within the last 3 months and whose major semantic version is 1.0 or higher (or some other way to weed out unfinished libraries), sortable by license.

  • by meddlepal on 10/1/18, 7:34 PM

    As Mark Twain eloquently put it...there are lies, damned lies and statistics.
  • by jaequery on 10/1/18, 8:32 PM

    JavaScript Is ambiguous these days. How much percentage of the JavaScript popularity is from the backend language Nodejs and how much of it from the client side(html, react, angular, etc)?
  • by vorg on 10/1/18, 10:01 PM

    Tiobe's faulty ranking method isn't even reliable. Three years ago Apache Groovy was in the top 20, but six months ago it wasn't even in the top 50. Now it's supposedly rising rapidly again. And this isn't the first time this has happened with Groovy -- it was also in the top 20 in Oct 2013 but three months later out of the top 50. I suspect Tiobe's rankings are being gamed somehow.
  • by sseth on 10/2/18, 4:22 AM

    Github ranking by open pull request to me feels the best approach if what you are interested in is trends of language adoption.

    Measuring total corpus would be measuring past popularity and may miss trends. Search metrics on the other hand may actually overrate newer languages which may be searched more compared to more mature languages. Opened pull requests sounds like a better metric overall.

  • by CryoLogic on 10/1/18, 6:53 PM

    I have absolutely no doubts that the most used language is JavaScript - considering nearly every website in existence relies on it.

    So it's really just a question of methodology. When TIOBE says popularity they must not be talking about usage rates, or if they are perhaps they are specifically sampling some subset of enterprise?

  • by mastazi on 10/1/18, 9:53 PM

    "The data doesn't match my personal preconceptions, so the methodology must be broken" Uhm yeah, nah.
  • by krschultz on 10/1/18, 9:56 PM

    The Stack Overflow developer survey is another data set that I trust.

    https://insights.stackoverflow.com/survey/2018#technology

  • by dcooper8 on 10/2/18, 4:15 AM

    Nor is github a perfect indicator. Some languages promote their own open-source repository hubs, such as https://gitlab.common-lisp.net.
  • by pankajdoharey on 10/2/18, 2:05 AM

    I dont think these measurements are any good or true reflection of a languages usage in the industry, some languages go totally under the radar, like clojure/clojurescript.