stages - break into lines count tabs is it a continuing line? (indented with spaces) join lines / blocks into statements tokenize a line split into expressions - operators - brackets bracketed expressions / statements ( ) blocks in { } ??? operators - unary minus must have a space or a different operator before it, but a symbol or a number after it: -3.141 (ok) - 3.141 (no) -foo (ok) - foo (no) cow*-1 (ok, but not, see below, adjacent operators are forbidden) cow * -1 (ok) -1*cow types of tokens - newline tab space number symbol operator string char how to decide what an operator is, i.e. if two or more operators are adjacent, are they one multi-char op, or two single-char ops? - enumerate them all, like C? - this is not very "generic" - must put spaces between adjacent operators? - I think this is reasonable 1+2+3 this is ok 1 + 2 + 3 this is ok too 1 + -2 ok 1 + -sin(3) this is ok a +=-2 this is no good a += -2 this is ok brackets () {} [] are treated specially; they are not operators - what about <> ? these are for template temp a(-3) this is ok: ( and - are not lumped together how to distinguish binary and unary operators from syntax? - unary operators are like unary minus: must be adjacent to following number / symbol / bracket: !foo (ok) !(a || b) (ok) ! (a || b) (no) ~foo (ok) ~ foo (no) how to distinguish expressions from lists of arguments? - in list context, spaces separate arguments; expressions must not contain spaces: print 1+2 3 4/10 {foo 3.123} print 1 + 2 - this would print 1, the function +, and 2 print 'this is nice' + 'the end' print 1 + 2 - this would print 3 - top-level function call / arguments are in list context, like the shell - can call a function with list-context syntax with { } - in expression context, can have spaces between operators: a := a + b + c/2 - in list context, if two spaces in a row, anything joined with one space is an expression - operator precendence - all determined by spacing: a = 1+2*3 - this sets a = 9 a = 1 + 2*3 - this sets a = 7 a = 1 + 2*3 - this tries to call a function "a" with args =, 1, +, 6 - or maybe sets a = 1, then returns 7 - this sucks! - all this doesn't work: how to choose which is an operator, which an operand, which a function to call? - like to be able to have "symbol" operators too: true and false - could only choose which is an operator by using macro patterns. - macro patterns have priority / precendece? print 1 + 2 3 4 sin(10) / 5 print `Hello World' strings use ` ' as quotes, these nest, like brackets. use " for apostrophe ??? or, could use ` " for quotes, ' for apostrophe. or could use {} or [] for quoting? print {hello world} print [hello world] print `hello world' a+b/2 1+2+3 1 + 2 + 3 this_is_cool*asdf_qwer 5*qwer (hello world) f(10 20) (f 10 20) ok, here is an evaluation model: 1. bracketed 2. individual terms 3. adjacent tokens (no space) are bracketed 4. tokens separated by 1 space are bracketed 5. tokens separated by 2 spaces are bracketed 6. ... 7. within each list of tokens, patterns are tried one-by-one, by precedence, 8. within brackets, a "," splits the brackets: foo(1,2) -> foo 1 2 foo(1 2) -> foo (1 2) (1 arg) 9. ";" is like C's "," (sequence of expressions) and ";" (sequence of statements), it is equivalent to a newline. - ";" and "," are in fact equivalent in nipl, as the different brackets are equivalent 10. array indexing... a[10] - the array behaves like a function - in fact it is a macro: map a index to address(a)+index*sizeof(element(a)) - a(10) or a{10} or (a 10) would do as well. foo 1 ; foo 2 , foo 3 for a=1 a<10 ++a print a left-to-right - how to do this efficiently? 2^3^4 -> 2^3 ^ 4 == 2 ^ 3*4 unlike the conventional right-to-left interpretation: 2 ^ 3^4 I would prefer if I could completely parse (bracket) an expression without having to know the meaning of the symbols. I think this is impossible though. How about we use spacing, and the distinction between operators and identifiers: 1 + 2 * 3 -> 9 1 + 2*3 -> 7 things we need to implement: - function calls / macros / syntactic structures - declarations - assignments - expressions function call: print 1 2 3 print 1+2 3 foo print 1 + 2 3 foo -> this prints 1 + 2 3 foo print 1 + 2 3 foo -> this prints: 3 3 foo print (1 + 2) 3 foo -> this prints: 3 3 foo print 1 + 2 -> this prints: 3 print(1 + 2, 3) -> this prints 3 3 address(a)+index*sizeof(element(a)) -> (address a) + (index * (sizeof (element a))) address(a) + index*sizeof(element(a)) : tokens _touching_ brackets are bound more strongly than normal tokens touching 1 + 2*3 - 7/5 7 / 1+2 these are ok. so we don't need special precedence rules a && b || c && d a && b || c && d -> a && b || c && d == ((a && b) || c) && d interpreting lists into macro calls has to be done in a certain order, e.g. supposing f and g are functions, and + is a function compose operator. f + g -> f(+,g) -> (f)+(g) actually, in nipl these two are equivalent anyway, both map to: f + g -> no, this is broken! how about anything in parens cannot match the operator in some syntax, only an argument? so: (print) 1 2 would be a list, not a function call. could do call (print) 1 2 -> print 1 2 -> no, this is broken too! how about anything in brackets cannot be tried as an operator UNLESS the rest fails to parse, i.e.: (print) + 1 -> (print) + (1) vs (print) 1 -> (print) (1) -> print 1 -> what happens to: 1 2 3 can't call 1 as a function! or can we? it could be an address I guess. f g 10 --> f (g) (10) not f(g 10) back to f + g ... whether `f' is called with args `+' and `g': f(+,g) or f (+) (g); or `+' is called with args `f' and `g' depends on the relative precedence of the patterns: (f) + (g) and f (x) (y) do we support right-left application, e.g. for 2^3^4 -> 2 ^ 3^4, 2^3^4^5^6^7 -> 2^(3^(4^(5^(6^7)))) also, unary ops should bind right-to-left too. --5 -> -(-5), not (-(-) !!6 -> !(!6) this is forbidden anyway, as unary ops must not be followed by a space, but must be preceded by a space. would have to be !(!6) or a = ! !6 ??? ok, so we don't need special precedence rules. At a particular level, if the first element is an operator, it is a unary op and is applied first. Then binary ops happen from left to right. Only an operator (i.e. not a textual name) can be a binary op. 5 + - 7^2 + 3 == 5 + -49 + 3 --> ((5) + (- (^ 7 2))) + (3) == 5 + -(7^2) + 3 5 + -7^2 + 3 --> 5 + (-7)^2 + 3 == 5 + 49 + 3 5 + - 7^2 + 3 --> (5) + (-) 7^2 + 3 --> ((5) + (-)) (7^2 + 3) --> ((5) + (-)) ((7^2 + 3)) 1 + 2 3 + 4 (f + g) 10 the "ternary op" a ? b : c is out, this would be parsed as: (a ? b) : (c) or if written: a ? b:c (a) ? ((b) : (c)) If : --> lambda (a) { if a { b } else { c } } Function call is not a normal unary operation, the parsing stops when it's parsed to a list. Function call is the interpretation of that list within a command context. So: lambda (a b c) if a b else c is ok, the a b c is interpreted as a list perhaps should have a better interpretation can we write main argc argv ... instead of main(argc argv) perhaps func main argc argv ... would be better? Actually, I think these DO need to go in brackets: func main(argc argv) ... or func main argc argv . ... or func main int argc char** argv . ... regarding type decl, better like this: or func main argc int argv char** . print argv[10]* where * is a unary suffix operator or func main int argc char[]* argv . print argv[10]* how to declare a function pointer? or func int argc char[]* argv main = print argv[10]* this fits the `type name' declaration model how about we use `=' for declaration? foo = int main = func argc = int argv = char[]* . print argv[10]* or main = func argc argv . print argv[10]* or, equivalently: main = func argc argv . print argv[10]* that's enough for tonight, it's 11:30 already! goodnight 2005.01.17"23:33:30