Lexical Scanner Implementation (3)
- 2018-10-24
- Liu, An-Chi 劉安齊
Gitbook: https://tigercosmos.github.io/lets-build-dbms/
I found a cool project – TiDB, which has more than 15k stars.
TiDB is an open-source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database. It features infinite horizontal scalability, strong consistency, and high availability. TiDB is MySQL compatible and serves as a one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads. – From Github
I found this project because an article *”How TiDB SQL Parser implement (TiDB SQL Parser 的实现)”*(Written in Chinese). The article introduces the parser of TiDB, and it’s quite helpful for me. The parser source code is at pingcap/parser. I refer partly from lexer.go
, which implement the scanner, and misc.go
, which implements the token identifier with trie.
Let’s see some snippets from TiDB
:
!FILENAME pingcap/parser/lexer.go
1 | // Scanner implements the yyLexer interface. |
!FILENAME pingcap/parser/misc.go
1 | func (s *Scanner) isTokenIdentifier(lit string, offset int) int { |
It is interesting and enlightening to read these code, but not just copy-and-parse. TiDB
use Yacc
to find the hierarchical structure of the program. The Scanner
structure is following the interface of Yacc
. However, I would implement all by myself without any other tools.
I spend too much time reading TiDB
source code, so I just program a little bit today. I think it’s fine that I show the code tomorrow, and it would be more complete.