Gitbook: https://tigercosmos.github.io/lets-build-dbms/

Today, I am going to implement the lexical scanner for StellarSQL. It would be a quite big engineering, so I could only do the part 1 today. I don’t even know how many parts it would be, but I would do my best.

There is a standard for SQL by ISO, which is “ISO/IEC 9075”. Moreover, every DBMS have their own SQL syntax. Those DBMS follow the standard and add the extension syntax. An extension is only for a certain DBMS which define and implement it, and would not work on another one.

The full list of keywords is too long and we usually do not use most of all. More syntax supported, more complicated a DBMS is. To keep StellarSQL simple, I use the keywords list in W3C SQL Tutorial, which is a basic version.

keywords list

Basically, these keywords are enough for normal usage.

So, I define these keywords in file src/sql/symbol.rs.

!FILENAME src/sql/symbol.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
struct Symbol {
name: String,
len: u32,
token: Token,
group: Group,
}

enum Group {
Keyword,
Function,
}

enum Token {
Add,
AddConstraint,
Alter,
AlterColumn,
AlterTable,
All,
And,
Any,
As,
Asc,
Between,
Case,
Check,
// ...
// ...
// ...
SelectTop,
Set,
Table,
Top,
TruncateTable,
Union,
UnionAll,
Unique,
Update,
Values,
View,
Where,
}

The Symbol structure stores information for tokens, which includes name, token, and group. For example, a Symbol of “CREATE” keyword is Symbol{ name: "CREATE", token: Token::CREATE, group: Group::keyword }.

Token stores all keywords of SQL that the scanner needs to know.

Group classify the symbol a keywords or a function.

I am studying the code of MySQL for more than 4 hours. That’s why it looks like I don’t write too much code. I will continue tomorrow.