imiskolee
imiskolee

mathematical expressions are not parsing correctly

<?php

echo 2 * (2 - 3);
(ast.EchoStmt)Echo
       	(ast.BinaryExpr)-
       		(*ast.Literal)Literal-float: 2
       		(*ast.Literal)Literal-float: 3
silbinarywolf
silbinarywolf

Lexer identifies class as Class token type in "$this->class" context.

Not sure if this intentional / a non-issue for your tooling, but I was hoping to iterate over each token to find out the class this way. Now I'm required to do additional ObjectOperator/ScopeResolutionOperator checks with the previous token.

silbinarywolf
silbinarywolf

Lexer hangs on bad PHP file from Codeception

Was looking at this project to look into making a fast PHP autocomplete daemon. Out of curiosity, any reason why the lexer is running in a Goroutine? Wouldn't it be the job of the Parser / user-code to run the lexer in a go-routine (if at all?)

File:
https://github.com/Codeception/Codeception/blob/2.2/tests/data/Invalid.php

File Contents:

<?php
$I do nothing here

My Golang loop:

for {
	token := lexer.Next()
	if token.Typ == PHPToken.EOF {
		break
	}
	// ... more
}
Briareos
Briareos

Line-comments eat up everything on files using only carriage-return

If the file uses only CR instead of LF/CRLF as line endings, the parser will treat everything after a line comment as part of that line comment.

Example (the \r character represents carriage-return, there's no \n anywhere in the file):

<?php\r
// comment\r
$a = 1;

Will yield // comment\r$a = 1; as a line comment.

To fix that I changed

func lexLineComment(l *lexer) stateFn {
    lineLength := strings.Index(l.input[l.pos:], "\n") + 1

to

func lexLineComment(l *lexer) stateFn {
    lineLength := strings.IndexAny(l.input[l.pos:], "\r\n") + 1

and it seems to work fine.

Rivendall
Rivendall

Frontend CLI for parsing a whole project

Hello
How run this library?
Its get file and parse file by file, and don't parse a project?

Thank you

Briareos
Briareos

Lexer - Infinite loop on identifiers that contain multibyte runes

The parser will hog the CPU when working on data like

<?php
$a = ‘test‘; // Note that ‘ is a 3-byte UTF-8 character %E2%80%98, not a single quote 

I get those files from users that copy-paste PHP code from online articles that apply formatting on PHP code. The reason it hangs is that *lexer.acceptRun doesn't accept multibyte characters since they are not in the valid set. A quick fix I did was add an additional check for utf8.RuneLen to accept multibyte strings, but I'm not sure it's the best workaround.

otremblay
otremblay

Line number in AST Node?

Hi,

I'm using your tool to parse PHP code with the secret hope of building some kind of static analysis tool with it; I wondered if it would be feasible and/or desirable to track line numbers in Node structures? I don't mind lending a hand for that, but if it's definitely unwanted I'd rather not waste time :)

gleamingthecube
gleamingthecube

Parser errors does not have the proper line number

Parser errors always display line=0 instead of the real line number.

gleamingthecube
gleamingthecube

Remove reference-or-not inconsistencies when manipulating AST objects

Several places in the code look for ast.Foo but also *ast.Foo.
Example in ast/printer/printer.go:

    case ast.AssignmentExpression:
        p.PrintAssignmentExpression(&n)
    case *ast.AssignmentExpression:
        p.PrintAssignmentExpression(n)

At the end of the day we are not sure about the kind of object inside in AST.

gleamingthecube
gleamingthecube

Parser issue with nested if/else blocks.

test file:

<?php
if (true)
    if (true)
        echo 1;
    else
        echo 2;

if (true) echo 1;
?>

got:

Found echo:"echo", expected [;]