下面是一个描述句子组成成分的方法:
句子 -> 主语 谓语 宾语 // 句子由主语和谓语和宾语构成
主语 -> 名词 | 代词 // 主语由名词或代词构成
谓语 -> 动词 // 谓语由动词构成
宾语 -> 形容词 | 名词 | 代词 // 宾语由形容词或名词或代词构成
代词 -> "I" //
动词 -> "love" // 能继续推导的叫非终结符, 如动词
名词 -> "cake" // 不能继续推导的叫终结符, 如"cake"
上面的例子太过简单,如果把这种描述方法运用的复杂一点,不知道你能不能看懂:
// 程序是由混合结构构成的,混合结构是由多个语句构成的
Program -> CompoundStatement
CompoundStatement -> { StatementList_opt }
StatementList -> Statement | StatementList Statement
// 语句由if、while、混合结构、ExpressionStatement构成
Statement -> IfStatement | WhileStatement | CompoundStatement | ExpressionStatement
// if 和 while语句
IfStatement -> if ( expression) Statement
IfStatement -> if ( expression) Statement else Statement
WhileStatement -> while(expression) Statement
// ExpressionStatement由赋值语句和声明构成
ExpressionStatement -> id = Expression ;
ExpressionStatement -> Declaration ;
// 表达式
Expression -> AdditiveExpression
AdditiveExpression -> MultiplicativeExpression
AdditiveExpression -> AdditiveExpression + MultiplicativeExpression
AdditiveExpression -> AdditiveExpression - MultiplicativeExpression
MultiplicativeExpression -> PrimaryExpression
MultiplicativeExpression -> MultiplicativeExpression * PrimaryExpression
MultiplicativeExpression -> MultiplicativeExpression / PrimaryExpression
PrimaryExpression -> id | num | (Expression)
// 变量和函数的声明
Declaration -> int DeclaratorList
DeclaratorList -> Declarator | DeclaratorList , Declarator
Declarator -> * Declarator | PostfixDeclarator
PostfixDeclarator -> DirectDeclarator | PostfixDeclarator[num] | PostfixDeclarator(ParameterList)
ParameterList -> int | ParameterList, int
DirectDeclarator -> id | (Declarator)
其实它描述的是一个简化版的C语言。
文法描述的是一颗以Program为根节点,终结符为叶节点的树。编译器只需要按照文法,一点一点的分析代码,就可以将这颗树结构建立起来。这颗树被称为语法树。
根据文法可以看懂许多复杂的代码,例如:
int(*f(int,int,int))[4];
一般人看不懂这东西,不过根据文法来分析就简单许多了。Declaration对应int(*f(int,int,int))[4],接着往下推导,可以知道PostfixDeclarator为(*f(int,int,int))[4],接下来是关键,因为我们的表达式中带着[],所以只能是PostfixDeclarator[num],所以编译器是先识别数组,再识别括号中的内容。
所以总的识别过程为:
INT ---> ARRAY[4] ---> POINTER ---> FUNCTION
根据识别过程,从前向后捋就可知定义的是什么东西:
int(*f(int,int,int))(int,int,int);
识别过程:
INT ---> FUNCTION ---> POINTER ---> FUNCTION
实际代码中常用一个数组装很多函数指针,然后根据不同情况调用不同的函数,这种方法可以防止switch中的case过多,这个数组的定义如下:
AstExpression (* ExprCheckers[])(AstExpression);
上文的简单文法并不支持这种定义,不过我们还是可以猜测出它的识别过程:
AstExpression ---> FUNCTION ---> POINTER ---> ARRAY
在C11标准每章的开头,有对应章节相关的文法。《The C Programming language》书中附录“A.13 Grammar”有完整的文法。
C标准文法的语法与前文的语法有些不同,不过原理都是相同的,很快可以猜出它的语法。C标准文法主要分为外部声明、语句和表达式三部分。看懂文法就相当于看懂了C语言解析器的结构,再看代码事半功倍。其中带opt下标的是可选项。
/********************以下是关于外部声明的文法*******************************
*
* C标准文法将变量的声明、变量的定义、函数的声明、函数的定义统统当作外部声明处理。
*/
translation-unit:
external-declaration
translation-unit external-declaration
external-declaration:
function-definition
declaration
// declaration-listopt是为了兼容老式风格函数
function-definition:
declaration-specifiersopt declarator declaration-listopt compound-statement
// 声明的格式:多个描述符 初始化列表;
declaration:
declaration-specifiers init-declarator-listopt ;
declaration-list:
declaration
declaration-list declaration
// 多次从storage-class-specifier、type-specifier、type-qualifier中选择描述符
declaration-specifiers:
storage-class-specifier declaration-specifiersopt
type-specifier declaration-specifiersopt
type-qualifier declaration-specifiersopt
storage-class-specifier: one of
auto register static extern typedef
type-specifier: one of
void char short int long float double signed unsigned struct-OR-union-specifier enum-specifier typedef-name
type-qualifier: one of
const volatile
struct-OR-union-specifier:
struct-OR-union identifieropt { struct-declaration-list }
struct-OR-union identifier
struct-OR-union: one of
struct union
struct-declaration-list:
struct-declaration
struct-declaration-list struct-declaration
init-declarator-list:
init-declarator
init-declarator-list , init-declarator
init-declarator:
declarator
declarator = initializer
struct-declaration:
specifier-qualifier-list struct-declarator-list ;
specifier-qualifier-list:
type-specifier specifier-qualijfier-listopt
type-qualifier specifier-qualifier-listopt
struct-declarator-list:
struct-declarator
struct-declarator-list, struct-declarator
struct-declarator:
declarator
declarator_opl : constant-expression
enum-specifier:
enum identifieropt { enumerator-list }
enum identifier
enumerator-list:
enumerator
enumerator-list , enumerator
enumerator:
identifier
identifier = constant-expression
declarator:
pointeropt direct-declarator
direct-declarator:
identifier
( declarator )
direct-declarator [ constant-expressionopt ]
direct-declarator ( parameter-type-list )
direct-declarator ( identifier-listopt )
pointer:
* type-qualifier-listopt
* type-qualifier-listopt pointer
type-qualifier-list:
type-qualifier
type-qualifier-list type-qualifier
parameter-type-list:
parameter-list
parameter-list, ...
parameter-list:
parameter-declaration
parameter-list , parameter-declaration
parameter-declaration:
declaration-specifiers declarator
declaration-specifiers abstract-declaratoropt
identifier-list:
identifier
identifier-list , identifier
initializer:
assignment-expression
{ initializer-list }
{ initializer-list , }
initializer-list:
initializer
initializer-list , initializer
type-name:
specifier-qualifier-list abstract-declaratoropt
abstract-declarator:
pointer
pointeropt direct-abstract-declarator
direct-abstract-declarator:
( abstract-declarator )
direct-abstract-declaratoropt [ constant-expressionopt ]
direct-abstract-declaratoropt ( parameter-type-listopt )
typedef-name:
identifier
/*************************以下是语句的文法********************************/
compound-statement:
{ declaration-listopt statement-listopt }
statement-list:
statement
statement-list statement
statement:
labeled-statement
expression-statement
compound-statement
selection-statement
iteration-statement
jump-statement
labeled-statement:
identifier : statement
case constant-expression : statement
default : statement
expression-statement:
expressionopt ;
selection-statement:
if ( expression ) statement
if (expression ) statement else statement
switch ( expression ) statement
iteration-statement:
while ( expression ) statement
do statement while ( expression ) ;
for ( expressionopt ; expressionopt ; expressionopt ) statement
jump-statement:
goto identifier ;
continue ;
break ;
return expressionopt ;
/*************************以下是表达式的文法*******************************
*
* C标准文法中为了表达不同二元运算符的优先级,设计了“套路相同但又有层次结构”的产生式。
* 从这些文法中可以清晰的看出各运算符的优先级。
*
*/
expression:
assignment-expression
expression , assignment-expression
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression
assignment-operator: one of
= *= /= %= += -= <<= >>= &= ^= |=
conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression
constant-expression:
conditional-expression
logical-OR-expression:
logical-AND-expression
logical-OR-expression || logical-AND-expression
logical-AND-expression:
inclusive-OR-expression
logical-AND-expression && inclusive-OR-expression
inclusive-OR-expression:
exclusive-OR-expression
inclusive-OR-expression | exclusive-OR-expression
exclusive-OR-expression:
AND-expression
exclusive-OR-expression ^ AND-expression
AND-expression:
equality-expression
AND-expression & equality-expression
equality-expression:
relational-expression
equality-expression == relational-expression
equality-expression != relational-expression
relational-expression:
shift-expression
relational-expression < shift-expression
relational-expression > shift-expression
relational-expression <= shift-expression
relational-expression >= shifit-expression
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression
multiplicative-expression:
cast-expression
multiplicative-expression * cast-expression
multiplicative-expression / cast-expression
multiplicative-expression % cast-expression
cast-expression:
unary-expression
( type-name ) cast-expression
unary-expression:
postfix-expression
++ unary-expression
--unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
unary-operator: one of
& * + - ~ !
postfix-expression:
primary-expression
postfix-expression [ expression ]
postfix-expression ( argument-expression-listopt )
postfix-expression . identifier
postfix-expression -> identifier
postfix-expression ++
postfix-expression --
primary-expression:
identifier
constant
string
( expression )
argument-expression-list:
assignment-expression
argument-expression-list , assignment-expression
constant:
integer-constant
character-constant
floating-constant
enumeration-constant