A handwritten fault-tolerant, recursive-descent parser for PHP written in Rust.

Overview

PHP-Parser

A handwritten fault-tolerant, recursive-descent parser for PHP written in Rust.

justforfunnoreally.dev badge

Warning - this is still alpha software and the public API is still subject to change. Please use at your own risk.


Usage

Add php-parser-rs in your Cargo.toml's dependencies section

[dependencies]
php-parser-rs = { git = "https://github.com/php-rust-tools/parser" }

or use cargo add

cargo add php-parser-rs --git https://github.com/php-rust-tools/parser

Example

use std::io::Result;

use php_parser_rs::parser;

const CODE: &str = r#"<?php

final class User {
    public function __construct(
        public readonly string $name,
        public readonly string $email,
        public readonly string $password,
    ) {
    }
}
"#;

fn main() -> Result<()> {
    match parser::parse(CODE) {
        Ok(ast) => {
            println!("{:#?}", ast);
        }
        Err(err) => {
            println!("{}", err.report(CODE, None, true, false)?);

            println!("parsed so far: {:#?}", err.partial);
        }
    }

    Ok(())
}

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Credits

Comments
  • fix: new ast for some language constructs

    fix: new ast for some language constructs

    This is a work in progress to address the issue discussed here: https://github.com/php-rust-tools/parser/issues/258

    Some notes:

    1) Currently, die() and exit() are both expressions. I have kept them that way, but I am wondering whether it would be more concise to construct them as statements instead. This is tricky because they behave like functions, but they are not.

    I have coded the spans start and end using an Option<Span> for die() and exit(), as parentheses are optional for them: https://github.com/php-rust-tools/parser/commit/1a0fd301818da1db7e05a3d2236bdb893de8f5b8

    print() handles this differently with value: Parenthesized {:

    https://github.com/php-rust-tools/parser/blob/main/tests/fixtures/0269/ast.txt#L349-L355

    2) print is treated as an expression, but it was coded differently: https://github.com/php-rust-tools/parser/blob/main/src/parser/expressions.rs#L1099

    (The Precedence::Prefix shouldn´t be Precedence::Print?)

    I am wondering whether die() and exit() should follow the same ideas used in print() or if we need to change print() to follow die() and exit().

    3) isset() and unset() are special because they cannot accept all kind of argument. For example, isset(null) is not valid. Therefore, we need to create a new issue to describe the need for a new function to parse their parameters and throw parser errors when necessary. I think this could be work for another PR.

    Not trivial to decide those things. :sweat_smile:

    opened by KennedyTedesco 9
  • fatal runtime error: stack overflow

    fatal runtime error: stack overflow

    When running the test suite on my computer (Linux, Intel I5 6 cores) I got this error:

    running 31 tests
    test composer ... ok
    test chubbyphp_framework ... ok
    test doctrine_dbal ... ok
    test api_platform ... ok
    
    thread 'mezzio-framework' has overflowed its stack
    fatal runtime error: stack overflow
    error: test failed, to rerun pass `--test third_party_tests`
    

    But, changing the stack size from:

    .stack_size(16 * 1024 * 1024)
    

    To:

    .stack_size(24 * 1024 * 1024)
    

    Fixes the issue.

    Are you guys planning some changes on this part or want me to make a PR for it (I mean, to change the stack size)?

    opened by KennedyTedesco 9
  • chore: aesthetic updates to tests

    chore: aesthetic updates to tests

    Hi everyone!

    I'm excited to make my first contribution to the project. I recently started learning about parsers and interpreters, and I've been really interested in exploring the subject further. As a beginner in both the subject and Rust, working on this project is definitely a great way to learn.

    I noticed that there were some areas of the codebase where the style and formatting could be improved, so I took the time to make some changes.

    Are you guys open to accepting this kind of change?

    Cheers!

    opened by KennedyTedesco 5
  • support more types

    support more types

    first commit: add support for more type nodes, don't allow combining standalone types, proper union and intersection type representations, add proper support for try block caught types

    second commit: add a small integration test runner, each test folder must contain code.php, if the code is will result in lexer errors, lexer-error.txt must be created next to code.php containing the formatted error, if the code is will result in parser errors, parser-error.txt must be created next to code.php containing the formatted error.

    opened by azjezz 4
  • Parse readonly properties and classes

    Parse readonly properties and classes

    Readonly properties, which are required to have type declarations:

    <?php
    
    class C {
        readonly int $a;
        readonly protected string $b;
        readonly static stdClass $c;
    
        function __construct(readonly bool $d) {}
    }
    

    Readonly classes: (coming in PHP 8.2)

    <?php
    
    readonly class C {}
    
    parser 
    opened by edsrzf 4
  • Lex and parse in terms of bytes rather than characters

    Lex and parse in terms of bytes rather than characters

    Fixes #26.

    I've tried to do as straightforward of conversion as possible so this might be non-optimal in some aspects, but I figure we can do further refactoring later. All tests pass.

    One thing that's worse now is test failures. The Debug impl for Vec<u8> prints things like [50, 51, 52] so it's not very human-readable.

    We could look at pulling in a dependency like bstr, or otherwise adding a custom Debug impl. Let me know what you think and I can either add it onto this PR or do it as a separate one.

    opened by edsrzf 4
  • chore: use `clap` to have command line arguments

    chore: use `clap` to have command line arguments

    This PR:

    • Let clap handle the command line arguments and options

    image

    I'm pretty new with rust, please let me know if I'm doing something wrong, I'm here to learn :)

    opened by drupol 3
  • feat: add support for generics

    feat: add support for generics

    • [x] class templates class Foo<T> {}
    • [x] interface templates interface Foo<T> {}
    • [x] method templates public function bar<T>(T $s): T {}
    • [x] function templates function bar<T>(T $s): T {}
    • [x] function call generics foo::<string>()
    • [x] method call generics Foo::bar::<string>()
    • [x] static method call generics $bar->foo::<string>()
    • [x] instance class with generics new Foo::<string>()
    • [x] generic types function foo(Bar<string> $s) {}
    • [x] covariance templates +T
    • [x] contravariance templates -T
    • [x] template sub-type constraint T as Foo
    • [x] template super-type constraint T super Foo
    • [x] template equal-type constraint T = Foo
    opened by azjezz 3
  • Reflection

    Reflection

    Something I've experimented with a little bit is a reflection API provided by the parser.

    You provide it a generated AST (or multiple ASTs) and the struct is able to traverse the AST to search for function declarations, classes, etc.

    You would then be able to call methods on the struct to find a particular class, function, get type information, params etc.

    This would do a lot of heavy lifting for static analysis tools, dead code finders, interpreters etc.

    future 
    opened by ryangjchandler 3
  • lexer: first pass at string interpolation

    lexer: first pass at string interpolation

    This implements most of double-quoted string interpolation in the lexer. The parser hasn't been modified yet.

    A couple things may need further tweaking:

    The particular tokens that are produced in certain cases. The PHP interpreter has some very specialized tokens (T_STRING, T_NUM_STRING, T_STRING_VARNAME) for string interpolation that I'm going to see if we can do without, and instead push the logic into the parser. If I'm able to do that, it might be possible to get rid of one or two of the extra lexer states added in this PR. I also may end up finding out that there's a good reason for wanting to do this work in the lexer.

    One other difference is that in the PHP interpreter, an interpolated string always begins and ends with the token ". In my implementation here, it always begins with a (possibly empty) TokenKind::StringPart, and always ends with TokenKind::DoubleQuote. Beginning with double quote means having to backtrack on what we've consumed, and beginning with a string part still seem unambiguous.

    The number handling in var_offset is also not quite right. See the TODO. I'll revisit once we've implemented all integer literals.

    Partially addresses #97.

    opened by edsrzf 3
  • parser/lexer: support __halt_compiler token

    parser/lexer: support __halt_compiler token

    Closes #71.

    Introduces a new LexerState::Halted variant. This will tell the lexer to ensure that the token is followed by (); same as PHP itself.

    It then just collects the rest of the tokens into a single InlineHtml token (probably could have done this differently but it works for now) and breaks out of the tokenization loop.

    opened by ryangjchandler 3
  • Visitor example

    Visitor example

    Could you please provide a short example for how to use traverser::Visitor? I'm not sure how to match for &mut dyn Node cases when traversing the AST

    opened by DMS-CPP 3
  • keep comments

    keep comments

    currently comments are skipped everywhere, however, we should instead:

    1. collect comments on every state.next() call
    2. retrieve comments and place them on the next statement.
    enhancement parser priority: high 
    opened by azjezz 0
  • Add an AST printer

    Add an AST printer

    An AST printer should be able to take an AST generated by the parser and reconstruct the original PHP source code.

    This will be especially useful for some tools I have planned.

    enhancement future priority: medium 
    opened by ryangjchandler 0
Releases(v0.1.0)
  • v0.1.0(Jan 6, 2023)

    Initial release

    The initial alpha release of the parser has support for essentially of PHP's syntax.

    There are still some features missing, such as tracking and storing comments on all nodes. These will be added in a future release.

    Source code(tar.gz)
    Source code(zip)
Owner
PHP Rust Tools
A collection of tools for PHP written in Rust.
PHP Rust Tools
Yet Another Parser library for Rust. A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing strings and slices.

Yap: Yet another (rust) parsing library A lightweight, dependency free, parser combinator inspired set of utility methods to help with parsing input.

James Wilson 117 Dec 14, 2022
Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Website for Microformats Rust parser (using 'microformats-parser'/'mf2')

Microformats 5 Jul 19, 2022
PEG parser for YAML written in Rust 🦀

yaml-peg PEG parser (pest) for YAML written in Rust ?? Quick Start ⚡️ # Run cargo run -- --file example_files/test.yaml # Output { "xmas": "true",

Visarut Phusua 4 Sep 17, 2022
MRT/BGP data parser written in Rust.

BGPKIT Parser BGPKIT Parser aims to provides the most ergonomic MRT/BGP message parsing Rust API. BGPKIT Parser has the following features: performant

BGPKIT 46 Dec 19, 2022
gors is an experimental go toolchain written in rust (parser, compiler).

gors gors is an experimental go toolchain written in rust (parser, compiler). Install Using git This method requires the Rust toolchain to be installe

Aymeric Beaumet 12 Dec 14, 2022
🕑 A personal git log and MacJournal output parser, written in rust.

?? git log and MacJournal export parser A personal project, written in rust. WORK IN PROGRESS; NOT READY This repo consolidates daily activity from tw

Steven Black 4 Aug 17, 2022
A CSS parser, transformer, and minifier written in Rust.

@parcel/css A CSS parser, transformer, and minifier written in Rust. Features Extremely fast – Parsing and minifying large files is completed in milli

Parcel 3.1k Jan 9, 2023
A WIP svelte parser written in rust. Designed with error recovery and reporting in mind

Svelte(rs) A WIP parser for svelte files that is designed with error recovery and reporting in mind. This is mostly a toy project for now, with some v

James Birtles 3 Apr 19, 2023
A native Rust port of Google's robots.txt parser and matcher C++ library.

robotstxt A native Rust port of Google's robots.txt parser and matcher C++ library. Native Rust port, no third-part crate dependency Zero unsafe code

Folyd 72 Dec 11, 2022
Rust parser combinator framework

nom, eating data byte by byte nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromi

Geoffroy Couprie 7.6k Jan 7, 2023
Parsing Expression Grammar (PEG) parser generator for Rust

Parsing Expression Grammars in Rust Documentation | Release Notes rust-peg is a simple yet flexible parser generator that makes it easy to write robus

Kevin Mehall 1.2k Dec 30, 2022
A fast monadic-style parser combinator designed to work on stable Rust.

Chomp Chomp is a fast monadic-style parser combinator library designed to work on stable Rust. It was written as the culmination of the experiments de

Martin Wernstål 228 Oct 31, 2022
A parser combinator library for Rust

combine An implementation of parser combinators for Rust, inspired by the Haskell library Parsec. As in Parsec the parsers are LL(1) by default but th

Markus Westerlind 1.1k Dec 28, 2022
LR(1) parser generator for Rust

LALRPOP LALRPOP is a Rust parser generator framework with usability as its primary goal. You should be able to write compact, DRY, readable grammars.

null 2.4k Jan 7, 2023
A typed parser generator embedded in Rust code for Parsing Expression Grammars

Oak Compiled on the nightly channel of Rust. Use rustup for managing compiler channels. You can download and set up the exact same version of the comp

Pierre Talbot 138 Nov 25, 2022
Rust query string parser with nesting support

What is Queryst? This is a fork of the original, with serde and serde_json updated to 0.9 A query string parsing library for Rust inspired by https://

Stanislav Panferov 67 Nov 16, 2022
Soon to be AsciiDoc parser implemented in rust!

pagliascii "But ASCII Doc, I am Pagliascii" Soon to be AsciiDoc parser implemented in rust! This project is the current implementation of the requeste

Lukas Wirth 49 Dec 11, 2022
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 21 Aug 23, 2022
This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCSS AST. Very early stage, do not use in production.

CSS(less like) parser written in rust (WIP) This project aims to implement a CSS(less like) parser in rust. Currently the code is targeting the PostCS

Huang Liuhaoran 21 Aug 23, 2022