C语言编译器的语法分析过程

你知道C语言是用C语言写出来的吗?是不是感觉有点矛盾?

准确的说是C编译器是用C语言写出来的。有句名言叫:世界上本没有编程语言,有的只是编译器。我们写的程序,只不过是一些字符串,C编译器将这些字符串转换成机器代码才可以执行。而主流的C编译器如gcc、clang等都是由C语言编写的。

文法

下面是一个描述句子组成成分的方法:

句子 -> 主语 谓语 宾语 	     // 句子由主语和谓语和宾语构成
主语 -> 名词 | 代词 	     // 主语由名词或代词构成
谓语 -> 动词 		     // 谓语由动词构成
宾语 -> 形容词 | 名词 | 代词    // 宾语由形容词或名词或代词构成
代词 -> "I"                   //
动词 -> "love"                // 能继续推导的叫非终结符, 如动词
名词 -> "cake"                // 不能继续推导的叫终结符, 如"cake"

上面的例子太过简单,如果把这种描述方法运用的复杂一点,不知道你能不能看懂:

// 程序是由混合结构构成的,混合结构是由多个语句构成的
Program -> CompoundStatement
CompoundStatement -> { StatementList_opt }
StatementList -> Statement | StatementList Statement

// 语句由if、while、混合结构、ExpressionStatement构成
Statement -> IfStatement | WhileStatement | CompoundStatement | ExpressionStatement

// if 和 while语句
IfStatement -> if ( expression) Statement 
IfStatement -> if ( expression) Statement else Statement 
WhileStatement -> while(expression) Statement 

// ExpressionStatement由赋值语句和声明构成
ExpressionStatement -> id = Expression ; 
ExpressionStatement -> Declaration ;

// 表达式
Expression -> AdditiveExpression
AdditiveExpression -> MultiplicativeExpression 
AdditiveExpression -> AdditiveExpression + MultiplicativeExpression 
AdditiveExpression -> AdditiveExpression - MultiplicativeExpression 
MultiplicativeExpression -> PrimaryExpression 
MultiplicativeExpression -> MultiplicativeExpression * PrimaryExpression 
MultiplicativeExpression -> MultiplicativeExpression / PrimaryExpression 
PrimaryExpression -> id | num | (Expression) 

// 变量和函数的声明
Declaration -> int DeclaratorList
DeclaratorList -> Declarator | DeclaratorList , Declarator
Declarator -> * Declarator | PostfixDeclarator 
PostfixDeclarator -> DirectDeclarator | PostfixDeclarator[num] | PostfixDeclarator(ParameterList) 
ParameterList -> int | ParameterList, int
DirectDeclarator -> id | (Declarator)

其实它描述的是一个简化版的C语言。

文法描述的是一颗以Program为根节点,终结符为叶节点的树。编译器只需要按照文法,一点一点的分析代码,就可以将这颗树结构建立起来。这颗树被称为语法树

根据文法可以看懂许多复杂的代码,例如:

int(*f(int,int,int))[4];

一般人看不懂这东西,不过根据文法来分析就简单许多了。Declaration对应int(*f(int,int,int))[4],接着往下推导,可以知道PostfixDeclarator为(*f(int,int,int))[4],接下来是关键,因为我们的表达式中带着[],所以只能是PostfixDeclarator[num],所以编译器是先识别数组,再识别括号中的内容。

所以总的识别过程为:
INT ---> ARRAY[4] ---> POINTER ---> FUNCTION

根据识别过程,从前向后捋就可知定义的是什么东西:

  1. int
  2. int数组
  3. int数组的指针
  4. 返回int数组的指针的函数
int(*f(int,int,int))(int,int,int);

识别过程:
INT ---> FUNCTION ---> POINTER ---> FUNCTION

  1. int
  2. 返回int的函数
  3. 返回int的函数的指针
  4. 返回返回int的函数的指针的函数

实际代码中常用一个数组装很多函数指针,然后根据不同情况调用不同的函数,这种方法可以防止switch中的case过多,这个数组的定义如下:

AstExpression (* ExprCheckers[])(AstExpression);

上文的简单文法并不支持这种定义,不过我们还是可以猜测出它的识别过程:
AstExpression ---> FUNCTION ---> POINTER ---> ARRAY

  1. AstExpression
  2. 返回AstExpression的函数
  3. 返回AstExpression的函数的指针
  4. 返回AstExpression的函数的指针的数组

编译器的结构

image

词法分析、语法分析、语义检查和中间代码生成这几个阶段被称为编译器的前端,它们与具体的机器无关。C 编译器再根据中间代码生成不同硬件平台的汇编语言,这部分工作被称为编译器的后端。

image

C语言文法

C11标准每章的开头,有对应章节相关的文法。《The C Programming language》书中附录“A.13 Grammar”有完整的文法。

C标准文法的语法与前文的语法有些不同,不过原理都是相同的,很快可以猜出它的语法。C标准文法主要分为外部声明、语句和表达式三部分。看懂文法就相当于看懂了C语言解析器的结构,再看代码事半功倍。其中带_opt后缀的是可选项。


/********************以下是关于外部声明的文法*******************************
*
* C标准文法将变量的声明、变量的定义、函数的声明、函数的定义统统当作外部声明处理。
*/

translation-unit:
	external-declaration
	translation-unit external-declaration

external-declaration:
	function-definition
	declaration

// declaration-list_opt是为了兼容老式风格函数
function-definition:
	declaration-specifiers_opt declarator declaration-list_opt compound-statement

// 声明的格式:多个描述符 初始化列表;
declaration: 
	declaration-specifiers init-declarator-list_opt ;

declaration-list:
	declaration
	declaration-list declaration

// 多次从storage-class-specifier、type-specifier、type-qualifier中选择描述符
declaration-specifiers:
	storage-class-specifier declaration-specifiers_opt
	type-specifier declaration-specifiers_opt
	type-qualifier declaration-specifiers_opt

storage-class-specifier: one of
	auto register static extern typedef

type-specifier: one of
	void char short int long float double signed unsigned struct-OR-union-specifier enum-specifier typedef-name

type-qualifier:  one of
	const volatile

struct-OR-union-specifier:
	struct-OR-union identifier_opt { struct-declaration-list }
	struct-OR-union identifier

struct-OR-union: one of
	struct union

struct-declaration-list:
	struct-declaration
	struct-declaration-list struct-declaration

init-declarator-list:
	init-declarator
	init-declarator-list , init-declarator

init-declarator:
	declarator
	declarator = initializer

struct-declaration:
	specifier-qualifier-list struct-declarator-list ; 

specifier-qualifier-list:
	type-specifier specifier-qualijfier-list_opt 
	type-qualifier specifier-qualifier-list_opt 

struct-declarator-list:
	struct-declarator
	struct-declarator-list, struct-declarator 

struct-declarator:
	declarator
	declarator_opl : constant-expression

enum-specifier:
	enum identifier_opt { enumerator-list } 
	enum identifier

enumerator-list:
	enumerator
	enumerator-list , enumerator

enumerator:
	identifier
	identifier = constant-expression

declarator:
	pointer_opt direct-declarator

direct-declarator:
	identifier
	( declarator )
	direct-declarator [ constant-expression_opt ] 
	direct-declarator ( parameter-type-list ) 
	direct-declarator ( identifier-list_opt ) 

pointer:
	* type-qualifier-list_opt
	* type-qualifier-list_opt pointer

type-qualifier-list:
	type-qualifier
	type-qualifier-list type-qualifier

parameter-type-list:
	parameter-list
	parameter-list, ...

parameter-list:
	parameter-declaration
	parameter-list , parameter-declaration

parameter-declaration:
	declaration-specifiers declarator
	declaration-specifiers abstract-declarator_opt

identifier-list:
	identifier
	identifier-list , identifier

initializer:
	assignment-expression
	{ initializer-list }
	{ initializer-list , }

initializer-list:
	initializer
	initializer-list , initializer

type-name: 
	specifier-qualifier-list abstract-declarator_opt 

abstract-declarator:
	pointer
	pointer_opt direct-abstract-declarator

direct-abstract-declarator:
	( abstract-declarator )
	direct-abstract-declarator_opt [ constant-expression_opt ]
	direct-abstract-declarator_opt ( parameter-type-list_opt )

typedef-name:
	identifier

/*************************以下是语句的文法********************************/

statement:
	labeled-statement
	expression-statement
	compound-statement
	selection-statement
	iteration-statement
	jump-statement

labeled-statement:
	identifier : statement
	case constant-expression : statement 
	default : statement

expression-statement:
	expression_opt ;

// 遇到大括号意味着新的作用域
compound-statement:
	{ declaration-list_opt statement-list_opt }

statement-list:
	statement
	statement-list statement

selection-statement:
	if ( expression ) statement
	if (expression ) statement else statement 
	switch ( expression ) statement

iteration-statement:
	while ( expression ) statement
	do statement while ( expression ) ;
	for ( expression_opt ; expression_opt ; expression_opt ) statement 

jump-statement:
	goto identifier ;
	continue ;
	break ;
	return expression_opt ;

/*************************以下是表达式的文法*******************************
*
* C标准文法中为了表达不同二元运算符的优先级,设计了“套路相同但又有层次结构”的产生式。
* 从这些文法中可以清晰的看出各运算符的优先级。
* 
*/


expression:
	assignment-expression
	expression , assignment-expression

assignment-expression:
	conditional-expression
	unary-expression assignment-operator assignment-expression 

assignment-operator: one of
	= *= /= %= += -= <<= >>= &= ^= |=

conditional-expression:
	logical-OR-expression
	logical-OR-expression ? expression : conditional-expression

constant-expression:
	conditional-expression

logical-OR-expression:
	logical-AND-expression
	logical-OR-expression || logical-AND-expression

logical-AND-expression:
	inclusive-OR-expression
	logical-AND-expression && inclusive-OR-expression

inclusive-OR-expression:
	exclusive-OR-expression
	inclusive-OR-expression | exclusive-OR-expression

exclusive-OR-expression:
	AND-expression
	exclusive-OR-expression ^ AND-expression

AND-expression:
	equality-expression
	AND-expression & equality-expression

equality-expression:
	relational-expression
	equality-expression == relational-expression
	equality-expression != relational-expression

relational-expression:
	shift-expression
	relational-expression < shift-expression
	relational-expression > shift-expression
	relational-expression <= shift-expression
	relational-expression >= shifit-expression

shift-expression:
	additive-expression
	shift-expression << additive-expression 
	shift-expression >> additive-expression 

additive-expression:
	multiplicative-expression
	additive-expression + multiplicative-expression 
	additive-expression - multiplicative-expression 

multiplicative-expression:
	cast-expression
	multiplicative-expression * cast-expression 
	multiplicative-expression / cast-expression 
	multiplicative-expression % cast-expression 

cast-expression:
	unary-expression
	( type-name ) cast-expression

unary-expression:
	postfix-expression
	++ unary-expression
	--unary-expression
	unary-operator cast-expression
	sizeof unary-expression
	sizeof ( type-name )

unary-operator: one of
	& * + - ~ !

postfix-expression:
	primary-expression
	postfix-expression [ expression ]
	postfix-expression ( argument-expression-list_opt )
	postfix-expression . identifier
	postfix-expression -> identifier
	postfix-expression ++
	postfix-expression --

primary-expression:
	identifier
	constant
	string
	( expression )

argument-expression-list:
	assignment-expression
	argument-expression-list , assignment-expression 

constant:
	integer-constant
	character-constant
	floating-constant
	enumeration-constant

参考:
ucc162.3

posted @ 2021-04-21 15:03:32
评论加载中...

发表评论