Safe Haskell | Safe |
---|---|
Language | Haskell2010 |
FreeC.Frontend.IR.Token
Contents
Description
This module contains the token data type for the intermediate language.
Synopsis
- data Token
- mkIdentToken :: String -> Token
- mkSymbolToken :: String -> Token
- specialSymbols :: [(String, Token)]
- data Keyword
- keywords :: [(String, Keyword)]
Tokens
The token classes of the intermediate language.
In the description of the grammar for the token classes below, we are using the following character classes.
<lower> ::= "a" | … | "z" | <any lowercase Unicode letter> <upper> ::= "A" | … | "Z" | <any upper- or titlecase Unicode letter> <numeric> ::= <digit> | <any Unicode numeric character> <digit> ::= "0" | … | "9" <octit> ::= "0" | … | "7" <hexit> ::= "0" | … | "9" | "a" | … | "f" | "A" | … | "F" <symbol> ::= <any Unicode symbol or punctuation> <graphic> ::= <lower> | <upper> | <symbol> | <numeric> <space> ::= " "
Identifiers
Identifiers can contain letters (both upper- and lowercase), digits, underscores and apostrophes. However, identifiers must not start with an digit or apostrophe. Identifiers starting with an uppercase letter identify constructors.
<identletter> ::= <lower> | <upper> | <numeric> | "_" | "'" <varid> ::= (<lower> | "_") { <identletter> } <conid> ::= <upper> { <identletter> }
Symbolic names
Symbolic names are sequences of arbitrary Unicode symbol and punctuation characters wrapped in parenthesis where no parenthesis are allowed in the name itself.
Constructor symbols are either empty or start with one of "["
, ":"
or ","
. The "["
and ","
start characters are needed in addition
to ":"
(which is the start character of constructor symbols in Haskell)
such that the empty list constructor and tuple constructors are recognized
correctly.
<namesymbol> ::= <symbol> \ ( "(" | ")" ) <consymstart> ::= "[" | ":" | "," <varsym> ::= "(" (<namesymbol> \ <consymstart>) { <namesymbol> } ")" <consym> ::= "(" [ <consymstart> { <namesymbol> } ] ")"
Integers
Integer tokens have an optional sign and can be in decimal, octal or hexadecimal notation. The prefixes of octal and hexadecimal integers as well as the digits (or "hexits") of hexadecimal integers are case insensitive.
<decimal> ::= <digit> { <digit> } <octal> ::= <octit> { <octit> } <hexadecimal> ::= <hexit> { <hexit> } <integer> ::= [ "+" | "-" ] <natural> <natural> ::= <decimal> | "0o" <octal> | "0O" <octal> | "0x" <hexadecimal> | "0X" <hexadecimal>
Strings
Strings are wrapped in double quotes. They can contain arbitrary Unicode letters, digits, symbols and punctuation characters as well as spaces and escape sequences.
<escape> ::= <any Haskell escape sequence> <gap> ::= "\" { <arbitrary Unicode whitespace> } "\" <string> ::= '"' { <graphic> \ ( '"' | "\" ) | <space> | <escape> | <gap> } '"'
Constructors
ConIdent String |
|
ConSymbol String |
|
VarIdent String |
|
VarSymbol String |
|
Keyword Keyword | A t |
IntToken Integer |
|
StrToken String |
|
At | "@" |
Comma | "," |
Dot | "." |
DoubleColon | "::" |
Equals | "=" |
Lambda | "\" |
LBrace | "{" |
LParen | "(" |
Pipe | "|" |
RBrace | "}" |
RParen | ")" |
RArrow | "->" |
Semi | ";" |
Bang | "!" |
mkIdentToken :: String -> Token Source #
Constructs a ConIdent
, VarIdent
, Token
for the given identifier
or keyword depending.
Constructors and variables can be told apart by the case of their first character. Constructors start with upper case letters and variables with lower case letters or an underscore.
If the given string occurs in keywords
, the corresponding keyword
token is returned instead.
mkSymbolToken :: String -> Token Source #
Special Symbols
specialSymbols :: [(String, Token)] Source #
Symbols that cannot be used as symbolic names.
Since the intermediate language is only parsed in tests, this constraints only the identifiers that can be used in tests. If the IR AST is generated by the frontend, identifiers are allowed to collide with keywords.
Keywords
Constructors
CASE | "case" |
DATA | "data" |
ELSE | "else" |
ERROR | "error" |
FORALL | "forall" |
IF | "if" |
IMPORT | "import" |
IN | "in" |
LET | "let" |
MODULE | "module" |
OF | "of" |
THEN | "then" |
TRACE | "trace" |
TYPE | "type" |
UNDEFINED | "undefined" |
WHERE | "where" |
keywords :: [(String, Keyword)] Source #
Maps reserved words that cannot be used as identifiers to the
corresponding Token
tokens.
Since the intermediate language is only parsed in tests, this constraints only the identifiers that can be used in tests. If the IR AST is generated by the frontend, identifiers are allowed to collide with keywords.