from Hacker News

Table Oriented Programming (2002)

by mabynogy on 12/30/21, 9:59 AM with 87 comments

by jasode on 12/30/21, 11:46 AM
I remember reading this essay when it first came out. To try and reword it using modern terms: The author wishes that programming languages had database persistence capabilities as 1st-class built-in syntax instead of cumbersome bolted-on API functions.
Examples where database syntax (i.e. SQL syntax) is 1st-class without noisy syntax of function calls, without command strings in quotes, etc :
- business languages like COBOL
- programming languages in ERP systems like SAP ABAP, Oracle Financials
- stored procedural languages inside the RDBMS engine such as T-SQL in MS SQL Server, PL/SQL in Oracle, sp in MySQL
In the above, the "database" is the world the programming language is working in.
The more general purpose programming languages like C++, Java, Javascript, Python omit db manipulation as a core language feature. This 2nd-class status requires 3rd-party libs, which means have extra ceremony syntax of #includes, imports, function calls with parentheses, etc. Some try to reduce the cumbersome syntax friction with ORMs. In contrast with something like SAP ABAP, the so-called "ORM" is already built in to process data tables without any friction.
The author works a lot on CRUD apps so a language that has inherent db syntax would enable "Table-Oriented-Programming".
But we can also twist the author's thesis around. A programmer is coding in SAP ABAP or a stored-procedure in MySQL and wonders why raw memory access where contiguous blocks of RAM can be changed with pointers is not easy to code in those languages. So an essay is written about advantages of "pointer-oriented-programming" because direct memory writes are really convenient for frame buffers in video games, etc.
In any case, I don't see any trend where a general-purpose programming language will include DB SQL as 1st-class. Even the recent languages like Rust and Zig don't have basic SQLite3 db persistence as convenient built-in syntax. If anyone proposed to add such db syntax, they would most likely reject it.
by sethhovestol on 12/30/21, 11:45 AM
I actually work in a table oriented language, harbour, a child of clipper/xBase mentioned in the article. There are a few issues I've found with a table oriented architecture:
1. Managing state is a bit of a nightmare. Harbour is based off of DBF databases, which are essentially flat files of a 2d array, and keeps your record number within any given db. You can then query a field with the arrow operator (table->field) but you have no guarantee that any subfunction is not changing state.
2. DBMS lock in. Because you're operating is totally different paradigm moving dbs is actually rather challenging. Harbour has a really nice system of replaceable database drivers(rdd), but when your code is all written assuming movement in a flat file, switching to a SQL based system is challenging. I'm currently in the process of writing a rdd to switch us to postgres, but translating the logic of holding state to the paradigm of gathering data then operating on it in an established code base is quite a challenge.
by mamcx on 12/30/21, 2:55 PM
For people like me, that worked in FoxPro, this is the dream.
Despite the claim this kind of tools is for "basic CRUD" they could do much more, much better, precisely because can deal MUCH better with the most challenged kind of programming:
CRUD apps.
Making apps in finance, erps, bussines, etc, are far more complex and challenging than build chat apps, where the scope is MORE clear and the features, reduced.
"Simple" crud apps NEVER stay simple.
NEVER.
If you allow it, in no time you are building a mix of your owm RDBMs, programming language, API orchestation, authorization framework, inference engines, hardware interfaces and more...
then, it must run in "Windows, Linux, Mac, Android, iOS, Web, Rasperry, that computer that is only know here in this industry", "please?"... and it will chases, also, all fads, all the time.
The request/features pipeline never end. The info about what to do is sketchy at best.
The turnaround to bring results is measure in HOURS/DAYs.
So, no.
No language without this, is in fact, good for the niche.

by chrisaycock on 12/30/21, 1:31 PM

I built my own table-oriented language out of frustrations I had with with time-series analysis:

Empirical has statically typed Dataframes. It can infer the type of a file's contents at compile time using a ton of metaprogramming techniques.

  >>> let trades = load("trades.csv")
  
  >>> trades
   symbol                  timestamp    price size
     AAPL 2019-05-01 09:30:00.578802 210.5200  780
     AAPL 2019-05-01 09:30:00.580485 210.8100  390
      BAC 2019-05-01 09:30:00.629205  30.2500  510
      CVX 2019-05-01 09:30:00.944122 117.8000 5860
     AAPL 2019-05-01 09:30:01.002405 211.1300  320
     AAPL 2019-05-01 09:30:01.066917 211.1186  310
     AAPL 2019-05-01 09:30:01.118968 211.0000  730
      BAC 2019-05-01 09:30:01.186416  30.2450  380
      CVX 2019-05-01 09:30:01.639577 118.2550 2880
      ...                        ...      ...  ...

Functions have generic typing by default; the caller determines the type instantiation. Here is a weighted average:

  >>> func wavg(ws, vs) = sum(ws * vs) / sum(ws)

Queries are built into the language. Here is a five-minute volume-weighted average price:

  >>> from trades select vwap = wavg(size, price) by symbol, bar(timestamp, 5m)
   symbol           timestamp       vwap
     AAPL 2019-05-01 09:30:00 210.305724
      BAC 2019-05-01 09:30:00  30.483875
      CVX 2019-05-01 09:30:00 119.427733
     AAPL 2019-05-01 09:35:00 202.972440
      BAC 2019-05-01 09:35:00  30.848397
      CVX 2019-05-01 09:35:00 119.431601
     AAPL 2019-05-01 09:40:00 204.671388
      BAC 2019-05-01 09:40:00  30.217362
      CVX 2019-05-01 09:40:00 117.224763
      ...                 ...        ...

Everything is statically typed. Misspelled column names, for example, result in an error before the script is even run!

by zokier on 12/30/21, 12:44 PM
I'm in the opinion that tables would make a lot of sense as first-class citizens for shell environments. Lots of data typically handled in shells is inherently tabular in nature (for example the outputs of ls and ps etc) and some of the common tools also are intended for tables (awk in the forefront, but also cut and sort as examples). But in practice lot of it is currently very ad-hoc, and handles any sort of edge cases poorly.
osquery already demonstrates that lot of info can be structured into tables, but what I feel is missing is more convenient, shell-like language environment to work with such data.
by DemocracyFTW on 12/30/21, 9:50 PM
The proliferation of field types has made data more difficult to transfer or share data between different applications and generates confusion. ITOP has only two fundamental data types: numeric and character, and perhaps a byte type for conversion purposes. (I have been kicking around ideas for having only one type.) The pre- and post-validators give any special handling needed by the field. A format string can be provided for various items like dates ("99/99/99"), Social-Security-Numbers ("999-99-9999"), and so forth. (Input formats are not shown in our sample DD.) Types like dates and SSN's can be internally represented (stored) just fine with characters or possibly integers. For example, December 31, 1998 could be represented as "19981231". This provides a natural sort order.
This is very nineties and I must disagree. The datetime-as-string example shows it most clearly: wanting to sort by full date is only one thing you want to do with calendar data; often you will want to compare, say, things that happened on Mondays vs things that happened over the weekend, or things that happened within so-and-so many hours around a given point in time and so, not to mention the complexities of DST and timezones. You can do all that with text-based strings but you'd have to write quite a bit of logic that will get applied to strings over and over again, or else you can store the results of parsing a date string into separate fields. Dates expressed as text also don't allow you to validate "19990229" or "20020631" in a very straightforward manner.
I think our collective and by now decades-old experience with duck/weakly-typed languages like Python, JavaScript, Ruby and so on clearly shows that what you gain in simplicity you lose in terms of assured correctness.
by kerblang on 12/30/21, 2:31 PM
This idea of "the database should be invisible!" abstraction was widely pursued back in the 90's when people were still obsessed with "the network should be invisible!" and Remote Procedure Calls (RPC). A lot of ORM's still reflect this obsession, and some programmers still get angry that they should have to deal with "this low-level SQL nonsense!"
Attempts to make I/O invisible failed and failed and failed again, and continue failing and failing again because it turns out that I/O is incredibly fundamental and not something you can just wave off as "low-level details". A networked database is a massive abstraction in its own right, and if invisible I/O is a doomed abstraction, forget invisible databases. Well, first go fail a few more times, then forget it, because we're not quite there yet on this one, are we...
The bigger the abstraction, the more it leaks. Sometimes you have enough headroom to go further, and sometimes you have to recognize that you've gone way too far.
by scotty79 on 12/30/21, 2:18 PM
> Fundamental and Consistent Collection Operations
I recently discovered that Scala collection library was designed with this exact goal in mind.
Interface of collections is highly consistent between various types and you can create custom collections using the same interface with very little custom code.
I found this very insightful https://docs.scala-lang.org/overviews/core/architecture-of-s...
Slick library pretty much turns database access into first class part of the Scala through this collections api
https://scala-slick.org/doc/3.3.3/introduction.html#what-is-...
by bob1029 on 12/30/21, 2:23 PM
We do a thing where we project all of the domain state (i.e. for a given user's session/work) into an in-memory database and then execute the business's SQL queries against it in order to determine logical outcomes.
I wouldn't really call it low/no code, since developing effective queries is non-trivial for many cases, but it does make it much more feasible for a non-developer to add incremental value to our product.
by kragen on 12/30/21, 11:36 AM
I'm glad this got posted! I wanted to reread this a couple of years ago and couldn't find it. Any idea what happened to TopMind?
by gpderetta on 12/30/21, 1:00 PM
My wish-list for my ideal (non-system) programming language:
- first class tables and named tuples as the primary datastructure. Includes the full set of relational operations, and transaction support. Optional persistence. Everything is not a table though. Tables are great but pragmatism trumps dogmatism.
- structural typing (ties neatly with the above) and support for row polymorphism
- shared nothing, distributed, multiprocessing, except for explicitly shared tables as transactions allow for safe controlled mutation of shared tables. Messages are just named tuples and row polymorphism should allow for protocol evolution. Message queues and stream can be abstracted as one pass tables.
- Async as in Cilk not JS. No red/green functions. Multiprocessing can be cheap, just spawn an user thread. The compiler will use whatever compilation strategy is the best (cactus stacks, full CPS transform, whatever).
- seamless job management, pipelines, graphs. Ideally this language should be a perfectly fine shell replacement. But with transparent support for running processes on multiple machines. And better error management.
A bit more nebulous and needs more thoughts:
- exceptions, error codes and optional/variant results are all faces of the same medal and can look the same with the right syntactic sugar.
- custom table representation. You can optionally decide how your table should be physically represented in memory or disk. Explicit pointers to speed up joins. Nested representation for naturally hierarchical data. Denormalized
- first class graphs. Graphs and relational tables are dual. And with the above point it should be possible to represent them efficiently. What operations we need?
- capabilities. All dependencies are passed to each function, no global data and code. You can tell if your function does IO or allocates by looking at its signature. Subsumes dependency injection. Implicit parameters and other syntactic sugar should make this bearable.
- staged compilation via partial evaluation. This should subsume macros. Variables are a tuple of (value, type), where type is a dictionary of operation-name->operation-implementation. First stage is a fully dynamic language, but by fixing the shape of the dictionary you get interfaces/traits/protocol with dynamic dispatch, by fixing the implementation you get static dispatch. Again, significant sugar is needed to make this workable.
edit:
missed an important element: - transparent remote code execution: run your code where your data is. Capabilities are pretty much a requirement for security.
by Avshalom on 12/30/21, 2:47 PM
This idea (or at the least nostalgia for xBase) pops up every now and then and while it certainly isn't describing Prolog I think the idea would be a lot more interesting if the authors had enough familiarity to compare and contrast.
by Animats on 12/30/21, 5:21 PM
Oh, that kind of table. I was expecting decision tables.[1]
"Smart contracts" for Etherium should have been decision tables. But no, they had to make it Turing-complete. A good thing about decision tables is that there's a finite and small number of cases, so they can be exhaustively tested. Also, they're readable. That's what you want for contracts. Not Solidity programs, which are expensively insecure.
[1] https://en.wikipedia.org/wiki/Decision_table
by abss on 12/30/21, 11:40 AM
I remember this page from geocites... Opend my eye about some ugly aspects of OOP. But, without proper marketing and without some luck a lot of ideas should be rediscovered again and again. And maybe the table oriented programming ideas are too common sense and therefore not a good kind of diferentatior compared with other smart ppl...
by slowmovintarget on 12/30/21, 9:46 PM
I recall debating this on Slashdot back in 2002. (I was a Bertrand Meyer OO convert back then). Good memories.
Functions and data are like spacetime and gravity. Beneath the emergent behavior in any software system, they are the things you find lurking underneath.
by teleforce on 12/31/21, 1:24 AM
Just wondering about the meaning of the satements from the article "Arrays are evil! Arrays are the Goto of the collections world". Anyone know exactly what it means? Is it referring to the raw array with pointers in C/C++ or array in C++ collections?
by RandyRanderson on 12/31/21, 1:56 AM
I stopped reading shortly after:
"a = (b * c) + e + f"
Something like this would have been a better ex:
a = b(c+e) + f
This guy maybe hasn't heard of operator overloading as no one would do as he suggests in most 'OOP' languages:
"a = ((b.times(c)).plus(e)).plus(f) // sillier"
by tomcooks on 12/30/21, 10:33 PM
Might be due to personal preferences, but after having worked on a legacy TOP codebase i must unapologetically say that it sucks.
by rsrsrs86 on 12/30/21, 11:20 PM
Surprised no one mentioned the relationship of this to Alloy