Why not SQL?
SQL, or Structured Query Language, is a programming language that has dominated the database space for decades. Initially created 50 years ago in 1974 just two years after C. Despite its undeniable success its age means it has many less-than-ideal properties particularly unsuitable for a modern database(s) like the agdb
.
Yet another programming language
SQL is a Turing complete programming language which has grown to be vast, complex and complicated. When a programmer needs to access or manipulate a database using SQL they need to learn entire additional language in addition to the one in which they are writing their program. This cognitive (over)load is very often being solved by abstracting the “raw” SQL behind an ORM (Object-relational mapping) to seemingly use the database directly from the main programming language. The keyword here is seemingly because under the hood any ORM will simply translate to a “raw” SQL that quite often might not be as performant as when written by hand.
ORM often violates the famous zero-cost abstraction rule favorite in languages like C++ or Rust that states “the abstraction is as performant as if written by hand”.
The textual nature of SQL means that it can be used from any programming language with ease, but that is of little importance for an individual programmer writing it in their program. While they enjoy the benefits of their IDE and language server for their primary languages, there is not much that can be done about the embedded SQL statements in their code. Until they are run against a database it cannot even be known if even the basic syntax is correct (certain plugins do offer limited SQL support for syntax etc. but the ultimate truth is the particular implementation of the given database). This unpleasant experience is worsened further with advent of new programming tooling like Copilot where the main language support is better than ever while SQL support lags behind as it is nearly impossible to improve the experience in that area.
Interpreting an injection
SQL injection is one of the most prevalent attacks using SQL programming languages. This is made possible by two major properties:
- SQL is a textual language.
- SQL is a runtime-interpreted language.
There is a long Wikipedia page on this attack citing serious recent cases of data breaches due to it. This problem cannot be definitively solved because it stems from the fundamental properties of the language. As long as it must take user input and is effectively interpreted only during runtime this type of attack can and will happen. Various mitigation strategies while effective (e.g. stored procedures, database permissions) also increases complexity and cost of using the language.
Probably the closest thing to eliminate SQL injection attacks entirely is use of server-side stored procedures. Essentially a type of remotely executed pre-defined function. However, the procedure itself must take precautions and to sanitize the inputs. Making changes to remote procedures is also much more difficult than simply changing the SQL statement in your code.
Not to mention that even when the client code is perfectly fine and sanitized the SQL interpreted on the database side does not necessarily have to be (although it typically will be).
Complexity and performance
As mentioned the SQL is a regular programming language and as such it allows writing literally anything in it. Back in the 70s and decades after this property was very valuable because client machines (unlike database servers) were not particularly performant. Letting the server do the heavy lifting and massaging the data into the correct output made sense and often was the only viable approach. Unfortunately it can also lead to quite complex queries. Have you ever seen 10 000 line long SQL query? Yes, singular. And yes, 4 zeroes. Making changes to such a query will take many days spent on understanding it and figuring out how to change it.
Nowadays, client machines are so powerful that doing any extra work on any server is generally undesirable. The situation is pretty much completely reversed. Particularly embedding application/business logic in database queries can lead to hard to change and hard to maintain code because SQL is not the primary language of any application and is often not even very suitable for it in the first place (it is a domain specific language for interfacing with the databases).
Performance of SQL can never match compiled or even JIT compiled languages. Being interpreted means it must be parsed (in itself very non-trivial task that can lead to SQL injections mentioned above) and then translated into internal database structures and commands to be executed. And once more mitigation steps exist (like stored procedures again) but they are simply a patch over a fundamental property of the language.
An alternative
Despite all of its shortcomings the SQL is still the dominant database language and the agdb
will have to support it at some point to certain degree as well. Somewhat frustratingly there is no good
off the shelf alternative. Many no-SQL databases went on and created their own textual interpreted language immediately suffering from the very same issues described above but without any of the 50 years of hardening SQL has had. And the only benefit of being usable from “any” programming language is usually immediately thrown away by the creation of an ORM for each language.
So what is the alternative?
Object based queries. Having a binary compiled format eliminates most of the above problems. A binary query cannot morph into another because of some user input. The code, including the queries, is written entirely and only in the language of choice. The performance is unmatched on both sides as the database does not need to interpret the text into its own structures and everything is as expected (or results in a query error if not). Language server and IDE can easily help like with any other code. There can never be a syntax error. It does not lend itself very easily to a thousand lines long data queries and basically forces the application to process the data in itself rather than on the database server (beyond basic operations like sorting or filtering).
The cost of course is that the query system needs to be either written or generated for each language (initial cost) and that such a query system is not Turing complete programming language (which is perhaps more of a benefit). And that seems like a small price for everything else. Nearly everything has changed and after 50 years it is time for something better in the database space and to start treating SQL like we do the languages from the same era - replace them with better and safer alternatives.