Gitbook: https://tigercosmos.github.io/lets-build-dbms/

Until now, StellarSQL has been able to receive message from and send answer back to clients. That’s good. It is a server, but it still cannot do works of database management.

Further, only with SQL parser, storage engine, and data engine, StellarSQL could be called a real DBMS.

These modules should be implemented in order, so let’s talk about the SQL parser first.

For any message sent from client, it must be a query in SQL format. The query could be “create table”, “insert data”, “delete date”, etc. In order to understand the query, we need a SQL parser.

Generally speaking, DBMS has a parser includes a lexical scanner and a grammar rule module. The lexical scanner splits the entire query into tokens (keywords or domain name), and the grammar rule module finds a combination of SQL grammar rules that produce this sequence, and process the code associated with those rules.

For example:

1
SELECT * FROM Customers WHERE Country = 'Mexico';

The lexical scanner breaks a SQL query above into tokens as:

  • SELECT
  • *
  • FROM
  • Customers
  • WHERE
  • Country
  • =
  • Mexico

A semicolon means the end of a query, so not it is not counted here.

Furthermore, the grammar rule module applies rules for these tokens, such as:

  • SELECT keyword is before columns
  • From keyword is before tables
  • Where keyword stands for conditions

A mature DBMS also does lots of optimizations. As you can imagine, the complexity of a SQL query requires an equally complex structure that efficiently stores the information needed for executing every possible SQL statement.

For example, according to the book, “Understanding MySQL Internals“, MySQL optimizer does some important tasks (I just list a few):

  • Determine which keys can be used to retrieve the records from tables, and choose the best one for each table.

  • Determine the order in which tables should be joined when more than one table is present in the query.

  • Eliminate unused tables from the join.

  • Determine whether keys can be used for ORDER BY and GROUP BY.

In the next days, I am going to implement the SQL parser. However, I am afraid that I could only implement it with basic algorithm. The optimizer is a huge engineering, not to mention my lack of time due to preparing midterm exams.

The good news is that StellarSQL can parse SQL soon. :)

Reference