Gitbook: https://tigercosmos.github.io/lets-build-dbms/

Finally, the scanner could identify a multikeyword, such as insert into, create table.

The algorithm for the scanner is really straightforward. Check the first word to see if the word could be a multikeyword. If the keyword has three words, read the following two words and check if the string match a multikeyword.

The algorithm looks some how ugly. I would like to refactor it later.

There are also tests for the scanner, just take a look at the bottom of lexer.rs.

!FILENAME sql/lexer.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
// if this is possible a multikeyword, search the following chars
match symbol::check_multi_keywords_front(word) {
// parts<Vec[u32]> for how many parts in this possible keyword
Some(parts) => {
println!("The word `{}` might be a multikeyword", word);

for keyword_total_parts in parts {
println!("Assume this keyword has {} parts", keyword_total_parts);

// copy remaining chars for testing
let mut test_chars = chars.as_str().chars();
// for testing if the string a multikeyword. Insert the first word
// and a space already. (because start scanning from next word)
let mut test_str = String::from(format!("{} ", word));

// for checking a new word
let mut is_last_letter = false;

// record the right cursor position when checking if multikeyword
// if match a multikeyword, shift right cursor with steps
let mut step_counter = 0;

// How many words added in the test_str
// if the keyword is 3 parts, the following_parts should be 2
let mut following_parts = 0;

loop {
match test_chars.next() {
Some(y) => {
// A multikeyword should be all ASCII alphabetic character
if y.is_ascii_alphabetic() {
if !is_last_letter {
is_last_letter = true;
}
test_str.push(y);
} else {
match y {
' ' | '\t' | '\r' | '\n' => {
if is_last_letter {
// from letter to space, count one
following_parts += 1;
// find enough parts, break earlier
if following_parts
== keyword_total_parts - 1
{
break; // loop
}
// add ` ` between words
test_str.push(' ');
is_last_letter = false
}
}
// &, %, *, @, etc.
// keywords must be letters
_ => break, // loop
}
}
}
None => break, // loop
}
step_counter += 1;
}

println!("Checking `{}` ...", test_str);
match symbol::SYMBOLS.get(test_str.as_str()) {
// a multikeyword
Some(token) => {
println!("Found keyword `{}`", test_str);
self.tokens.push(token.clone());

// shift the right cursor to the right of multikeyword
self.pos.cursor_r += step_counter;
// skip the chars included in this multikeyword
for _ in 0..step_counter {
chars.next();
}

is_multi_keyword = true;
break; // parts
}
None => println!("`{}` not a keyword", test_str),
}
}
}
None => {}
}