MCC(My Compiler Collection)是我自己写的C语言编译器。MCC并不是按照C标准文法实现的,而是进行了修改。本文介绍MCC的文法。
translation-unit: external-declaration translation-unit external-declaration
translation-unit是整个C语言文件,它由多个外部声明external-declaration组成。变量的定义、函数的声明、函数的定义统统当作外部声明处理。例如下面就是4个外部声明:
static int a;
char arr[20];
int hello(int a,char b);
typedef int * AstTypeName;
typedef int (* ARR)[3];
int main(int argc, char const *argv[])
{
}
external-declaration: common-header compound-statement common-header ; common-header: declaration-specifiersopt init-declarator-listopt
external-declaration的第一行是函数定义,compound-statement代表函数体,这部分暂且不管。第二行表示变量或函数的声明。由于这两部分前面都一样,所以抽出公共部分构成common-header。所以common-header所代表的部分是这些:
static int a
char arr[20]
int hello(int a,char b)
typedef int * AstTypeName
typedef int (* ARR)[3]
int main(int argc, char const *argv[])
下面来看一下common-header的第一部分declaration-specifiers:
declaration-specifiers: storage-class-specifier declaration-specifiersopt type-specifier declaration-specifiersopt type-qualifier declaration-specifiersopt storage-class-specifier: one of auto register static extern typedef type-specifier: one of void char short int long float double signed unsigned struct-OR-union-specifier typedef-name type-qualifier: one of const volatile typedef-name: identifier
不难看出declaration-specifiers就是一个或多个void、char、 short、 int、 long、 float 等关键字。其中struct-OR-union-specifier相对复杂一些:
struct-OR-union-specifier: struct-OR-union identifieropt { struct-declaration-list } struct-OR-union identifier struct-OR-union: one of struct union struct-declaration-list: struct-declaration struct-declaration-list struct-declaration struct-declaration: specifier-qualifier-list struct-declarator-list ; specifier-qualifier-list: type-specifier specifier-qualijfier-listopt type-qualifier specifier-qualifier-listopt struct-declarator-list: declarator struct-declarator-list, declarator
当这些关键字解析完后,common-header的剩余部分为:
a
arr[20]
hello(int a,char b)
* AstTypeName
(* ARR)[3]
main(int argc, char const *argv[])
以上就是common-header的第二部分init-declarator-list的内容,文法为:
init-declarator-list: init-declarator init-declarator-list , init-declarator init-declarator: declarator declarator = initializer
init-declarator-list表达的意思是多个声明可以连在一起写,用逗号隔开。init-declarator表达的意思是声明是由带等号的或是不带等号的组成。这里的文法可以发现一个漏洞,就是函数声明/定义后面接等号也被允许,这个问题放到语意检查中解决。我们的例子中没有用逗号连写的声明,也没有=号,所以declarator所代表的代码为:
a
arr[20]
hello(int a,char b)
* AstTypeName
(* ARR)[3]
main(int argc, char const *argv[])
declarator: * declarator postfix-declarator postfix-declarator: direct-declarator postfix-declarator [ conditional-expressionopt ] postfix-declarator ( parameter-type-list ) direct-declarator: identifier ( declarator )
declarator的作用是在direct-declarator前面加一个或多个*,去掉*号后,我们的例子符合direct-declarator的这几种形式:
a //第一种
arr[20] //第三种
hello(int a,char b) //第四种
AstTypeName //第一种
(* ARR)[3] //第三种和第二种
main(int argc, char const *argv[]) //第四种
参数列表和声明很相似,但也有些不同,其一是声明支持初值,参数列表不支持初值。其二是参数列表支持可变参数。
parameter-type-list: parameter-list parameter-list, ... parameter-list: parameter-declaration parameter-list , parameter-declaration parameter-declaration: declaration-specifiers declarator
赋初值是由表达式expression完成的,这部分后面会介绍。
initializer: expression { initializer-list } { initializer-list , } initializer-list: initializer initializer-list , initializer
compound-statement: { declaration-listopt statement-listopt } declaration-list: declaration declaration-list declaration declaration: common-header ;
混合语句compound-statement中包含声明和语句。声明前面已经介绍过了,下面我们来看一下语句。
statement-list: statement statement-list statement statement: labeled-statement expression-statement compound-statement selection-statement iteration-statement jump-statement
语句由标签语句、表达式语句、选择语句、迭代语句、跳转语句构成。
labeled-statement: identifier : statement case conditional-expression : statement default : statement expression-statement: expressionopt ; selection-statement: if ( expression ) statement if (expression ) statement else statement switch ( expression ) statement iteration-statement: while ( expression ) statement do statement while ( expression ) ; for ( expressionopt ; expressionopt ; expressionopt ) statement jump-statement: goto identifier ; continue ; break ; return expressionopt ;
这些语句无非就是关键字+表达式构成的,相对于声明来说非常容易理解。
expression: conditional-expression unary-expression assignment-operator expression assignment-operator: one of = *= /= %= += -= <<= >>= &= ^= |=
expression可以带赋值符号,也可以不带赋值符号。
下面是二元运算的文法,为了表达二元运算的优先级,设计了“套路相同但又有层次结构”的产生式。
conditional-expression: logical-OR-expression logical-OR-expression ? expression : conditional-expression logical-OR-expression: logical-AND-expression logical-OR-expression || logical-AND-expression logical-AND-expression: inclusive-OR-expression logical-AND-expression && inclusive-OR-expression inclusive-OR-expression: exclusive-OR-expression inclusive-OR-expression | exclusive-OR-expression exclusive-OR-expression: AND-expression exclusive-OR-expression ^ AND-expression AND-expression: equality-expression AND-expression & equality-expression equality-expression: relational-expression equality-expression == relational-expression equality-expression != relational-expression relational-expression: shift-expression relational-expression < shift-expression relational-expression > shift-expression relational-expression <= shift-expression relational-expression >= shifit-expression shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression additive-expression: multiplicative-expression additive-expression + multiplicative-expression additive-expression - multiplicative-expression multiplicative-expression: cast-expression multiplicative-expression * cast-expression multiplicative-expression / cast-expression multiplicative-expression % cast-expression
强制类型转换表达式:
cast-expression: unary-expression ( type-name ) cast-expression
接下来是一元表达式:
unary-expression: postfix-expression ++ unary-expression --unary-expression unary-operator cast-expression sizeof unary-expression sizeof ( type-name ) unary-operator: one of & * + - ~ ! postfix-expression: primary-expression postfix-expression [ expression ] postfix-expression ( argument-expression-listopt ) postfix-expression . identifier postfix-expression -> identifier postfix-expression ++ postfix-expression -- primary-expression: identifier constant string ( expression ) constant: integer-constant character-constant floating-constant enumeration-constant argument-expression-list: expression argument-expression-list , expression
下面的type-name是给类型转换和sizeof用的
type-name: specifier-qualifier-list abstract-declaratoropt abstract-declarator: pointer pointeropt direct-abstract-declarator direct-abstract-declarator: ( abstract-declarator ) direct-abstract-declaratoropt [ constant-expressionopt ] direct-abstract-declaratoropt ( parameter-type-listopt )
语法分析的代码完全是按照文法写的,所以只要看懂文法,代码应该不难理解。因为语法分析的代码太长,本文就不具体分析代码了。
当语法树建立完成后可以用graphviz打印出来,便于调试。