Description of the language

Warning

This is a work in progress. Not everything described here can be taken for granted until we produce a stable release.

Goals and inspiration

The language is inspired in other functional languages, but if you find it too Haskell-like, that’s not a coincidence. We’re using Haskell as the gold standard, but our language is much less ambitious.

We only aim to integrate a statically typed language into Python, with type checking and type inference (Damas-Hindley-Milner) and where pattern matching is used to do most of the branching.

The idea is to complement programs that expose some sort of programming to its users, but do not require a general programming language.

We have such a system; users needs to define complex procedures to compute prices. However complex, all price schemes seem to be programmable with simple expressions, function application and composition and a powerful pattern matching (probably with some extensions for matching dates and date intervals).

We provide a parser, but the system using this language may choose to present its user with a very different way to build programs.

The language

The language allows to define algebraic data types (ADTs) and values (functions, most likely). A full program is just a non-empty sequence of definitions.

Definitions come in five types:

  • Type (scheme) declarations;
  • Algebraic data types;
  • Value definitions (functions belong here);
  • Type classes definitions; and
  • Instances (of type classes).

The order of the definitions in a program is unimportant. But the order of the equations within a function definition is important.

A very simple program:

>>> from xotl.fl import parse
>>> parse(r'''
...     alias :: [Char]
...     alias = name
...
...     name = "xotl.fl"
...
...     myId :: a -> a
...     myId x = x
...
...     myId2 = \x -> x
...
...     myConst :: a -> b -> a
...     myConst a _ = a
...
...     data MyList a = EmptyList
...                   | Cons a (MyList a)
...                   deriving (Eq)
...
...     insert a xs  = Cons a xs
...
...     isnull :: MyList a -> Bool
...     isnull EmptyList = True
...     isnull _         = False
...
...     insert :: a -> MyList a -> MyList a
...
...     concat :: MyList a -> MyList a -> MyList a
... ''')
[...]

Notice that:

  • We use the identifier ‘name’ in the definition of ‘alias’, before the very definition of ‘name’. This is no problem at all.

  • We are not required to provide type declarations for all values.

  • We can provide the type declarations after or before the value definition.

  • We may even provide type declarations for things we didn’t define (‘concat’). This makes the name available for type-checking but the value is supposedly provided by other means. In order, for a program to run all values must be provided. Undefined values default to a runtime error.

    Note

    There are things the parse() allows to do that you shouldn’t. We might change our mind and prohibit them in the future.

Parsing is not the whole story. Parsing just creates an Abstract Syntax Tree out of your source code. For things to really work, you need to type-check them and run them.

Expressions

The right hand side of values (and function) definitions are made up from expressions. The AST of expressions is documented in xotl.fl.ast.expressions.

In the examples below, the return of the parse() is always an instance of some AST class.

Literals and identifiers

The simplest expressions are those made up of a single identifier or a literal value.

Identifiers are made up of letters, digits and ‘_’ but they must not start with a digit.

Examples:

>>> from xotl.fl.parsers.expressions import parse
>>> parse('a')
Identifier('a')
>>> parse('_1e')
Identifier('_1e')

The expression language allows literal values:

  • Unicode characters are surrounded with apostrophes '. You can use the backslash (\) to enter the apostrophe, the backslash itself and other Unicode code points.

    Examples:

    >>> parse(r"'\\'")
    Literal('\\', TypeCons('Char', ()))
    
    >>> parse(r"'\''")
    Literal("'", TypeCons('Char', ()))
    
    >>> parse(r"'\x20'")
    Literal(' ', TypeCons('Char', ()))
    
    >>> parse(r"'\u0020'")
    Literal(' ', TypeCons('Char', ()))
    

    Notice that the value in the Literal object is a Python string; but it will always be one character long.

  • Strings are surrounded with quotation mark ". You can use the backslash to enter the quotation mark, the backslash itself and other Unicode code points.

    Example:

    >>> parse('""')
    Literal('', TypeCons('[]', (TypeCons('Char', ()),)))
    
    >>> parse(r'"\""')
    Literal('"', TypeCons('[]', (TypeCons('Char', ()),)))
    
    >>> parse(r'"\\"')
    Literal('\\', TypeCons('[]', (TypeCons('Char', ()),)))
    

    Notice the String type is just the list of Char.

  • Numbers. We collapse integers and floats into a single type the numbers. Integers can be written in base 10, 2, 8 and 16:

    >>> parse('1000') == parse('0x03e8') == parse('0b001111101000')
    True
    
    >>> parse('1000') == parse('0o1750')
    True
    

    You can use ‘_’ as a padding to make your numbers more readable:

    >>> parse('1_000') == parse('0x03e8') == parse('0b0011_1110_1000')
    True
    

    You can use as many as you like and wherever you need it (except at the beginning):

    >>> parse('0b0_1_01___0') == parse('0b1010')
    True
    

    You can use the exponent to represent floating point numbers:

    >>> parse('1e+200')  # doctest: +ELLIPSIS
    Literal(1e+200, ...)
    

    But beware of a leading ‘_’:

    >>> parse('_1e+200')  # doctest: +ELLIPSIS
    Application(Application(Identifier('+'), Identifier('_1e')), ...)
    
  • The unit value. This is the only value of the UnitType:

    >>> parse('()')
    Literal((), TypeCons('Unit', ()))
    

Application

Application (function invocation in other languages) is represented by white space.

Examples:

>>> parse('f a')
Application(Identifier('f'), Identifier('a'))

Application is left associative and it’s the operation with the highest priority:

>>> parse('f a b') == parse('(f a) b')
True

Composition

The dot operator (.) represents composition of functions. In the AST this is just the application of the identifier . to its arguments:

>>> parse('f . g')
Application(Application(Identifier('.'), Identifier('f')), Identifier('g'))
>>> parse('f . g') == parse('(.) f g')
True
>>> parse('(.) f')
Application(Identifier('.'), Identifier('f'))
>>> parse('(.)')
Identifier('.')

It gains special treatment because it associates to the right and, after the application, is next in priority:

>>> parse('f . g . h') == parse('f . (g . h)')
True
>>> parse('f g . h') == parse('(f g) . h')
True
>>> # This funny expression is syntactically valid, but it won't type-check.
>>> parse('f . g + 1') == parse('(f . g) + 1')
True

Warning

There must be a space before and/or after the dot operator.

Operators

The standard operators +, -, *, /, //, % stand for binary operations between numbers. They all associate to the left. The operators *, /, // and % have the same precedence between them, but higher than +, and -:

>>> parse('a + b * c') == parse('a + (b * c)')
True
>>> parse('a * b / c') == parse('(a * b)/c')
True

Any other combination of those symbols along with any of <>$^&!@#=| are user operators and they have less precedence that the binary operators. Notice that standard comparison operators (<, >, <=, >=, == and !=) are in this category:

>>> parse('a + b <= c - d') == parse('(a + b) <= (c - d)')
True
>>> parse('return >=> m')
Application(Application(Identifier('>=>'), Identifier('return')), Identifier('m'))

Infix form of a function application

Any identifier can become an infix operator by enclosing it in ticks (`). Infix has the lowest precedence:

>>> parse('a `f` b') == parse('f a b')
True
>>> parse('a > b `f` c - d') == parse('(a > b) `f` (c - d)')
True

Lambdas

Lambda abstractions are represented with the concise syntax of Haskell:

\args -> body

Even though the AST Lambda supports a single argument the parser admits several and does the expected currying:

>>> parse(r'\a b -> a') == parse(r'\a -> \b -> a')
True
>>> parse(r'\a -> \b -> a')
Lambda('a', Lambda('b', Identifier('a')))

Let and where

A let expression has the general schema:

let <name 1> [<patterns 1>] = <body 1>
    ...
    <name n> [<patterns n>] = <body 2>
in <expression>

The patterns must be identifiers of pattern-matching data constructor. When doing several definitions you must split each definition with a newline and indent the next line [1].

The parser will produce a ConcreteLet node.

The ‘where’ expressions produce the same AST. The general schema is:

<expression> where <name 1> [<patterns 1>] = <body 1>
                   <name 2> [<patterns 2>] = <body 2>
                   ...

Lists

The : operator is used to created lists. It has the builtin type a -> [a] -> [a]. : is right-associative:

>>> parse('a:b:xs') == parse('a:(b:xs)')
True

It has less precedence than any other operator except the infix form:

>>> parse('a + b:xs') == parse('(a + b):xs')
True
>>> parse('a `f` b:xs') == parse('a `f` (b:xs)')
True

The empty list is the identifier []:

>>> parse('[]')
Identifier('[]')

See also

The empty list identifier if you want to know why this is an identifier and not a literal.

The usual list syntax can be used in place of the : operator:

>>> parse('[1, 2]') == parse('1:2:[]')
True

Note

The parser allows heterogeneous types, but the typechecker will reject them:

>>> from xotl.fl.typecheck import typecheck
>>> from xotl.fl.builtins import builtins_env
>>> typecheck(parse('[1, "a"]'), builtins_env)
Traceback (most recent call last):
...
UnificationError: Cannot type-check ...

Tuples

Tuple are a sequence of 2 or more expressions. Unlike Python’s tuples the number (and types) of components of tuple are precise and functions may take tuples of a specific type.

Examples:

>>> parse('(1, "a")')
Application(Application(Identifier(','), Literal(1, ...
>>> parse('(1, "a", id x)')
Application(Application(Application(Identifier(',,'), Literal(1, ...
>>> parse('(1, "a", id x, 0)')
Application(Application(Application(Application(Identifier(',,,'), Literal(1, ...

Type declarations

Type declarations state the type of a symbol. The function xotl.fl.parsers.types.parse() parses the type expression (the thing after the two colons) and return an instance of AST for types.

The AST of types has two basic constructors: TypeVariable and TypeCons.

In the type expression language we use identifiers starting with a lower-case letter to indicate a type variable, unless they are applied to other type expression, in which case they’re regarded as type constructors. Identifiers starting with an upper-case letter always denote a type constructor.

Examples:

>>> from xotl.fl.parsers.types import parse
>>> parse('a')
TypeVariable('a')
>>> parse('a b')
TypeCons('a', (TypeVariable('b'),))
>>> parse('a B c')
TypeCons('a', (TypeCons('B', ()), TypeVariable('c')))

Notice that the type variable ‘c’ is an argument for the type constructor ‘a’, and not for ‘B’. You can use parenthesis to make it so:

>>> parse('a (B c)')
TypeCons('a', (TypeCons('B', (TypeVariable('c'),)),))

The function type constructor is the arrow ‘->’:

>>> parse('a -> B')
TypeCons('->', (TypeVariable('a'), TypeCons('B', ())))

The list type constructor is the pair of brackets ‘[]’:

>>> parse('[a]')
TypeCons('[]', (TypeVariable('a'),))

The tuple type constructor is just types enclosed in parenthesis and separated by commas:

>>> parse('(a, b)')
TypeCons(',', (TypeVariable('a'), TypeVariable('b')))
>>> parse('(a, a -> c, c)')
TypeCons(',,', (TypeVariable('a'), ...

Even though the type expression language recognizes those type constructions specially there’s nothing really special about them in terms of the type language AST; they are simply TypeCons with some funny names; for which we expect that components that assign meaning to these constructions (i.e semantics) assign them with the usual ones.

At the moment, tuples cannot have just one component.

The tuple with 0 components is the unit type:

>>> parse('()')
TypeCons('Unit', ())

The unit type has a single value, the unit value:

>>> from xotl.fl.parsers.expressions import parse as parse_expression
>>> parse_expression('()')
Literal((), TypeCons('Unit', ()))

Type schemes

Type schemes express (explicitly) the notion of universal qualification in type expressions (for all).

You may use the keyword forall to create type schemes explicitly:

>>> parse('forall a b. (a, b)')
<TypeScheme: forall a b. (a, b)>

Also, the classmethod from_typeexpr() creates type schemes from other types expressions:

>>> from xotl.fl.ast.types import TypeScheme
>>> TypeScheme.from_typeexpr(parse('(a, b)')) == parse('forall a b. (a, b)')
True

When you annotate any name, the parser creates type schemes implicitly:

>>> from xotl.fl import parse
>>> parse('id :: a -> a') == parse('id :: forall a. a -> a')
True

New lines

You can split long type expressions in several lines, but you only do so in a controlled manner:

  • You can’t break between constructors and its arguments, nor within the arguments themselves; unless you use parenthesis.
  • You can’t break before the arrow ‘->’, but breaking after it is OK, but also you need to indent the rest of the type expression.

Quirks of type expression language

TypeCons does not have an implicit limit to the type arguments any given constructor admits. This is the job of the semantic analyzer. This also means that the parser has a very liberal rule about type arguments in a constructor:

Any type expression to the left of a space and another type expression is admitted it as an argument.

This makes the parser to recognize funny, unusual types expressions:

>>> from xotl.fl.parsers.types import parse as parse_type
>>> parse_type('[a] b')
TypeCons('[]', (TypeVariable('a'), TypeVariable('b')))
>>> parse_type('(a -> b) c')
TypeCons('->', (TypeVariable('a'), TypeVariable('b'), TypeVariable('c')))

Those types have no semantics assigned but the parser recognizes them. It’s the job of another component (kinds?) to recognize those errors.

Type classes and instances

Type classes allow to overload operators over many possible implementations. They were introduced in Haskell and formalized in [Wadler1989].

The syntax to define a type class is like:

class [constraints =>] <ClassName> <type variable> where
     <... type class body ...>

Examples:

>>> from xotl.fl import parse
>>> Eq = parse('''
... class Eq a where
...     (==) :: a -> a -> Bool
... ''')[0]
>>> Ord = parse('''
... class Eq a => Ord a where
...     (<)  :: a -> a -> Bool
...     (<=) :: a -> a -> Bool
...     (<=) a b = a < b `or` a == b
... ''')[0]

Instances provide the implementations of type classes for types. Assuming _eq_number is builtin with type Number -> Number -> Bool, you could say:

>>> _eq_num_instance = parse('''
... instance Eq Number where
...     (==) :: Number -> Number -> Bool
...     (==) = _eq_number
... ''')[0]

Instances must constrain all it’s variables:

>>> _eq_either = parse('''
... instance Eq a, Eq b => Eq (Either a b) where
...     (==) (Left a) (Left b)   = a == b
...     (==) (Right a) (Right b) = a == b
...     (==) _         _         = False
... ''')

Data types can derive the instances of some builtin type classes:

>>> dt = parse('''
... data Either a b = Left a | Right b
...                   deriving (Eq)
... ''')[0]

This creates the same instance of Eq as the one shown above.