下面是一个描述句子组成成分的方法:
句子 -> 主语 谓语 宾语 // 句子由主语和谓语和宾语构成
主语 -> 名词 | 代词 // 主语由名词或代词构成
谓语 -> 动词 // 谓语由动词构成
宾语 -> 形容词 | 名词 | 代词 // 宾语由形容词或名词或代词构成
代词 -> "I" //
动词 -> "love" // 能继续推导的叫非终结符, 如动词
名词 -> "cake" // 不能继续推导的叫终结符, 如"cake"
上面的例子太过简单,如果把这种描述方法运用的复杂一点,不知道你能不能看懂:
// 程序是由混合结构构成的,混合结构是由多个语句构成的
Program -> CompoundStatement
CompoundStatement -> { StatementList_opt }
StatementList -> Statement | StatementList Statement
// 语句由if、while、混合结构、ExpressionStatement构成
Statement -> IfStatement | WhileStatement | CompoundStatement | ExpressionStatement
// if 和 while语句
IfStatement -> if ( expression) Statement
IfStatement -> if ( expression) Statement else Statement
WhileStatement -> while(expression) Statement
// ExpressionStatement由赋值语句和声明构成
ExpressionStatement -> id = Expression ;
ExpressionStatement -> Declaration ;
// 表达式
Expression -> AdditiveExpression
AdditiveExpression -> MultiplicativeExpression
AdditiveExpression -> AdditiveExpression + MultiplicativeExpression
AdditiveExpression -> AdditiveExpression - MultiplicativeExpression
MultiplicativeExpression -> PrimaryExpression
MultiplicativeExpression -> MultiplicativeExpression * PrimaryExpression
MultiplicativeExpression -> MultiplicativeExpression / PrimaryExpression
PrimaryExpression -> id | num | (Expression)
// 变量和函数的声明
Declaration -> int DeclaratorList
DeclaratorList -> Declarator | DeclaratorList , Declarator
Declarator -> * Declarator | PostfixDeclarator
PostfixDeclarator -> DirectDeclarator | PostfixDeclarator[num] | PostfixDeclarator(ParameterList)
ParameterList -> int | ParameterList, int
DirectDeclarator -> id | (Declarator)
其实它描述的是一个简化版的C语言。
文法描述的是一颗以Program为根节点,终结符为叶节点的树。编译器只需要按照文法,一点一点的分析代码,就可以将这颗树结构建立起来。这颗树被称为语法树。
根据文法可以看懂许多复杂的代码,例如:
int(*f(int,int,int))[4];
一般人看不懂这东西,不过根据文法来分析就简单许多了。Declaration对应int(*f(int,int,int))[4]
,接着往下推导,可以知道PostfixDeclarator为(*f(int,int,int))[4]
,接下来是关键,因为我们的表达式中带着[]
,所以只能是PostfixDeclarator[num]
,所以编译器是先识别数组,再识别括号中的内容。
所以总的识别过程为:
INT ---> ARRAY[4] ---> POINTER ---> FUNCTION
根据识别过程,从前向后捋就可知定义的是什么东西:
int(*f(int,int,int))(int,int,int);
识别过程:
INT ---> FUNCTION ---> POINTER ---> FUNCTION
实际代码中常用一个数组装很多函数指针,然后根据不同情况调用不同的函数,这种方法可以防止switch中的case过多,这个数组的定义如下:
AstExpression (* ExprCheckers[])(AstExpression);
上文的简单文法并不支持这种定义,不过我们还是可以猜测出它的识别过程:
AstExpression ---> FUNCTION ---> POINTER ---> ARRAY
在C11标准每章的开头,有对应章节相关的文法。《The C Programming language》书中附录“A.13 Grammar”有完整的文法。
C标准文法的语法与前文的语法有些不同,不过原理都是相同的,很快可以猜出它的语法。C标准文法主要分为外部声明、语句和表达式三部分。看懂文法就相当于看懂了C语言解析器的结构,再看代码事半功倍。其中带opt
下标的是可选项。
/********************以下是关于外部声明的文法******************************* * * C标准文法将变量的声明、变量的定义、函数的声明、函数的定义统统当作外部声明处理。 */ translation-unit: external-declaration translation-unit external-declaration external-declaration: function-definition declaration // declaration-listopt是为了兼容老式风格函数 function-definition: declaration-specifiersopt declarator declaration-listopt compound-statement // 声明的格式:多个描述符 初始化列表; declaration: declaration-specifiers init-declarator-listopt ; declaration-list: declaration declaration-list declaration // 多次从storage-class-specifier、type-specifier、type-qualifier中选择描述符 declaration-specifiers: storage-class-specifier declaration-specifiersopt type-specifier declaration-specifiersopt type-qualifier declaration-specifiersopt storage-class-specifier: one of auto register static extern typedef type-specifier: one of void char short int long float double signed unsigned struct-OR-union-specifier enum-specifier typedef-name type-qualifier: one of const volatile struct-OR-union-specifier: struct-OR-union identifieropt { struct-declaration-list } struct-OR-union identifier struct-OR-union: one of struct union struct-declaration-list: struct-declaration struct-declaration-list struct-declaration init-declarator-list: init-declarator init-declarator-list , init-declarator init-declarator: declarator declarator = initializer struct-declaration: specifier-qualifier-list struct-declarator-list ; specifier-qualifier-list: type-specifier specifier-qualijfier-listopt type-qualifier specifier-qualifier-listopt struct-declarator-list: struct-declarator struct-declarator-list, struct-declarator struct-declarator: declarator declarator_opl : constant-expression enum-specifier: enum identifieropt { enumerator-list } enum identifier enumerator-list: enumerator enumerator-list , enumerator enumerator: identifier identifier = constant-expression declarator: pointeropt direct-declarator direct-declarator: identifier ( declarator ) direct-declarator [ constant-expressionopt ] direct-declarator ( parameter-type-list ) direct-declarator ( identifier-listopt ) pointer: * type-qualifier-listopt * type-qualifier-listopt pointer type-qualifier-list: type-qualifier type-qualifier-list type-qualifier parameter-type-list: parameter-list parameter-list, ... parameter-list: parameter-declaration parameter-list , parameter-declaration parameter-declaration: declaration-specifiers declarator declaration-specifiers abstract-declaratoropt identifier-list: identifier identifier-list , identifier initializer: assignment-expression { initializer-list } { initializer-list , } initializer-list: initializer initializer-list , initializer type-name: specifier-qualifier-list abstract-declaratoropt abstract-declarator: pointer pointeropt direct-abstract-declarator direct-abstract-declarator: ( abstract-declarator ) direct-abstract-declaratoropt [ constant-expressionopt ] direct-abstract-declaratoropt ( parameter-type-listopt ) typedef-name: identifier /*************************以下是语句的文法********************************/ compound-statement: { declaration-listopt statement-listopt } statement-list: statement statement-list statement statement: labeled-statement expression-statement compound-statement selection-statement iteration-statement jump-statement labeled-statement: identifier : statement case constant-expression : statement default : statement expression-statement: expressionopt ; selection-statement: if ( expression ) statement if (expression ) statement else statement switch ( expression ) statement iteration-statement: while ( expression ) statement do statement while ( expression ) ; for ( expressionopt ; expressionopt ; expressionopt ) statement jump-statement: goto identifier ; continue ; break ; return expressionopt ; /*************************以下是表达式的文法******************************* * * C标准文法中为了表达不同二元运算符的优先级,设计了“套路相同但又有层次结构”的产生式。 * 从这些文法中可以清晰的看出各运算符的优先级。 * */ expression: assignment-expression expression , assignment-expression assignment-expression: conditional-expression unary-expression assignment-operator assignment-expression assignment-operator: one of = *= /= %= += -= <<= >>= &= ^= |= conditional-expression: logical-OR-expression logical-OR-expression ? expression : conditional-expression constant-expression: conditional-expression logical-OR-expression: logical-AND-expression logical-OR-expression || logical-AND-expression logical-AND-expression: inclusive-OR-expression logical-AND-expression && inclusive-OR-expression inclusive-OR-expression: exclusive-OR-expression inclusive-OR-expression | exclusive-OR-expression exclusive-OR-expression: AND-expression exclusive-OR-expression ^ AND-expression AND-expression: equality-expression AND-expression & equality-expression equality-expression: relational-expression equality-expression == relational-expression equality-expression != relational-expression relational-expression: shift-expression relational-expression < shift-expression relational-expression > shift-expression relational-expression <= shift-expression relational-expression >= shifit-expression shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression additive-expression: multiplicative-expression additive-expression + multiplicative-expression additive-expression - multiplicative-expression multiplicative-expression: cast-expression multiplicative-expression * cast-expression multiplicative-expression / cast-expression multiplicative-expression % cast-expression cast-expression: unary-expression ( type-name ) cast-expression unary-expression: postfix-expression ++ unary-expression --unary-expression unary-operator cast-expression sizeof unary-expression sizeof ( type-name ) unary-operator: one of & * + - ~ ! postfix-expression: primary-expression postfix-expression [ expression ] postfix-expression ( argument-expression-listopt ) postfix-expression . identifier postfix-expression -> identifier postfix-expression ++ postfix-expression -- primary-expression: identifier constant string ( expression ) argument-expression-list: assignment-expression argument-expression-list , assignment-expression constant: integer-constant character-constant floating-constant enumeration-constant