Gitbook: https://tigercosmos.github.io/lets-build-dbms/

I found a cool project – TiDB, which has more than 15k stars.

TiDB is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads. – From Github

I found this project because an article *”How TiDB SQL Parser implement (TiDB SQL Parser 的实现)”*(Written in Chinese). The article introduces the parser of TiDB, and it’s quite helpful for me. The parser source code is at pingcap/parser. I refer partly from lexer.go, which implement the scanner, and misc.go, which implements the token identifier with trie.

Let’s see some snippets from TiDB:

!FILENAME pingcap/parser/lexer.go

1
2
3
4
5
6
7
8
9
10
11
12
13
// Scanner implements the yyLexer interface.
type Scanner struct {
r reader
buf bytes.Buffer

errs []error
stmtStartPos int

// For scanning such kind of comment: /*! MySQL-specific code */ or /*+ optimizer hint */
specialComment specialCommentScanner

sqlMode mysql.SQLMode
}

!FILENAME pingcap/parser/misc.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
func (s *Scanner) isTokenIdentifier(lit string, offset int) int {
// An identifier before or after '.' means it is part of a qualified identifier.
// We do not parse it as keyword.
if s.r.peek() == '.' {
return 0
}
if offset > 0 && s.r.s[offset-1] == '.' {
return 0
}
buf := &s.buf
buf.Reset()
buf.Grow(len(lit))
data := buf.Bytes()[:len(lit)]
for i := 0; i < len(lit); i++ {
if lit[i] >= 'a' && lit[i] <= 'z' {
data[i] = lit[i] + 'A' - 'a'
} else {
data[i] = lit[i]
}
}

checkBtFuncToken, tokenStr := false, string(data)
if s.r.peek() == '(' {
checkBtFuncToken = true
} else if s.sqlMode.HasIgnoreSpaceMode() {
s.skipWhitespace()
if s.r.peek() == '(' {
checkBtFuncToken = true
}
}
if checkBtFuncToken {
if tok := btFuncTokenMap[tokenStr]; tok != 0 {
return tok
}
}
tok := tokenMap[tokenStr]
return tok
}

It is interesting and enlightening to read these code, but not just copy-and-parse. TiDB use Yacc to find the hierarchical structure of the program. The Scanner structure is following the interface of Yacc. However, I would implement all by myself without any other tools.

I spend too much time reading TiDB source code, so I just program a little bit today. I think it’s fine that I show the code tomorrow, and it would be more complete.