Fast and robust e-mail parsing library for Rust

Overview

mail-parser

crates.io build docs.rs crates.io Twitter Follow

mail-parser is an e-mail parsing library written in Rust that fully conforms to the Internet Message Format standard (RFC 5322), the Multipurpose Internet Mail Extensions (MIME; RFC 2045 - 2049) as well as other internet messaging RFCs.

It also supports decoding messages in 41 different character sets including obsolete formats such as UTF-7. All Unicode (UTF-*) and single-byte character sets are handled internally by the library while support for legacy multi-byte encodings of Chinese and Japanese languages such as BIG5 or ISO-2022-JP is provided by the optional dependency encoding_rs.

In general, this library abides by the Postel's law or Robustness Principle which states that an implementation must be conservative in its sending behavior and liberal in its receiving behavior. This means that mail-parser will make a best effort to parse non-conformant e-mail messages as long as these do not deviate too much from the standard.

Unlike other e-mail parsing libraries that return nested representations of the different MIME parts in a message, this library conforms to RFC 8621, Section 4.1.4 and provides a more human-friendly representation of the message contents consisting of just text body parts, html body parts and attachments. Additionally, conversion to/from HTML and plain text inline body parts is done automatically when the alternative version is missing.

Performance and memory safety were two important factors while designing mail-parser:

  • Zero-copy: Practically all strings returned by this library are Cow<str> references to the input raw message.
  • High performance Base64 decoding based on Chromium's decoder (the fastest non-SIMD decoder).
  • Fast parsing of message header fields, character set names and HTML entities using perfect hashing.
  • Written in 100% safe Rust with no external dependencies.
  • Every function in the library has been fuzzed and meticulously tested with MIRI.
  • Thoroughly battle-tested with millions of real-world e-mail messages dating from 1995 until today.

Usage Example

    let input = concat!(
        "From: Art Vandelay <[email protected]> (Vandelay Industries)\n",
        "To: \"Colleagues\": \"James Smythe\" <[email protected]>; Friends:\n",
        "    [email protected], =?UTF-8?Q?John_Sm=C3=AEth?= <[email protected]>;\n",
        "Date: Sat, 20 Nov 2021 14:22:01 -0800\n",
        "Subject: Why not both importing AND exporting? =?utf-8?b?4pi6?=\n",
        "Content-Type: multipart/mixed; boundary=\"festivus\";\n\n",
        "--festivus\n",
        "Content-Type: text/html; charset=\"us-ascii\"\n",
        "Content-Transfer-Encoding: base64\n\n",
        "PGh0bWw+PHA+SSB3YXMgdGhpbmtpbmcgYWJvdXQgcXVpdHRpbmcgdGhlICZsZHF1bztle\n",
        "HBvcnRpbmcmcmRxdW87IHRvIGZvY3VzIGp1c3Qgb24gdGhlICZsZHF1bztpbXBvcnRpbm\n",
        "cmcmRxdW87LDwvcD48cD5idXQgdGhlbiBJIHRob3VnaHQsIHdoeSBub3QgZG8gYm90aD8\n",
        "gJiN4MjYzQTs8L3A+PC9odG1sPg==\n",
        "--festivus\n",
        "Content-Type: message/rfc822\n\n",
        "From: \"Cosmo Kramer\" <[email protected]>\n",
        "Subject: Exporting my book about coffee tables\n",
        "Content-Type: multipart/mixed; boundary=\"giddyup\";\n\n",
        "--giddyup\n",
        "Content-Type: text/plain; charset=\"utf-16\"\n",
        "Content-Transfer-Encoding: quoted-printable\n\n",
        "=FF=FE=0C!5=D8\"=DD5=D8)=DD5=D8-=DD =005=D8*=DD5=D8\"=DD =005=D8\"=\n",
        "=DD5=D85=DD5=D8-=DD5=D8,=DD5=D8/=DD5=D81=DD =005=D8*=DD5=D86=DD =\n",
        "=005=D8=1F=DD5=D8,=DD5=D8,=DD5=D8(=DD =005=D8-=DD5=D8)=DD5=D8\"=\n",
        "=DD5=D8=1E=DD5=D80=DD5=D8\"=DD!=00\n",
        "--giddyup\n",
        "Content-Type: image/gif; name*1=\"about \"; name*0=\"Book \";\n",
        "              name*2*=utf-8''%e2%98%95 tables.gif\n",
        "Content-Transfer-Encoding: Base64\n",
        "Content-Disposition: attachment\n\n",
        "R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\n",
        "--giddyup--\n",
        "--festivus--\n",
    )
    .as_bytes();

    let message = Message::parse(input);

    // Parses addresses (including comments), lists and groups
    assert_eq!(
        message.get_from(),
        &Address::Address(Addr {
            name: Some("Art Vandelay (Vandelay Industries)".into()),
            address: Some("[email protected]".into())
        })
    );
    assert_eq!(
        message.get_to(),
        &Address::GroupList(vec![
            Group {
                name: Some("Colleagues".into()),
                addresses: vec![Addr {
                    name: Some("James Smythe".into()),
                    address: Some("[email protected]".into())
                }]
            },
            Group {
                name: Some("Friends".into()),
                addresses: vec![
                    Addr {
                        name: None,
                        address: Some("[email protected]".into())
                    },
                    Addr {
                        name: Some("John Smîth".into()),
                        address: Some("[email protected]".into())
                    }
                ]
            }
        ])
    );

    assert_eq!(
        message.get_date().unwrap().to_iso8601(),
        "2021-11-20T14:22:01-08:00"
    );

    // RFC2047 support for encoded text in message readers
    assert_eq!(
        message.get_subject().unwrap(),
        "Why not both importing AND exporting? ☺"
    );

    // HTML and text body parts are returned conforming to RFC8621, Section 4.1.4 
    assert_eq!(
        message.get_html_body(0).unwrap().to_string(),
        concat!(
            "<html><p>I was thinking about quitting the &ldquo;exporting&rdquo; to ",
            "focus just on the &ldquo;importing&rdquo;,</p><p>but then I thought,",
            " why not do both? &#x263A;</p></html>"
        )
    );

    // HTML parts are converted to plain text (and viceversa) when missing
    assert_eq!(
        message.get_text_body(0).unwrap().to_string(),
        concat!(
            "I was thinking about quitting the “exporting” to focus just on the",
            " “importing”,\nbut then I thought, why not do both? ☺\n"
        )
    );

    // Supports nested messages as well as multipart/digest
    let nested_message = match message.get_attachment(0).unwrap() {
        MessagePart::Message(v) => v,
        _ => unreachable!(),
    };

    assert_eq!(
        nested_message.get_subject().unwrap(),
        "Exporting my book about coffee tables"
    );

    // Handles UTF-* as well as many legacy encodings
    assert_eq!(
        nested_message.get_text_body(0).unwrap().to_string(),
        "ℌ𝔢𝔩𝔭 𝔪𝔢 𝔢𝔵𝔭𝔬𝔯𝔱 𝔪𝔶 𝔟𝔬𝔬𝔨 𝔭𝔩𝔢𝔞𝔰𝔢!"
    );
    assert_eq!(
        nested_message.get_html_body(0).unwrap().to_string(),
        "<html><body>ℌ𝔢𝔩𝔭 𝔪𝔢 𝔢𝔵𝔭𝔬𝔯𝔱 𝔪𝔶 𝔟𝔬𝔬𝔨 𝔭𝔩𝔢𝔞𝔰𝔢!</body></html>"
    );

    let nested_attachment = match nested_message.get_attachment(0).unwrap() {
        MessagePart::Binary(v) => v,
        _ => unreachable!(),
    };

    assert_eq!(nested_attachment.len(), 42);

    // Full RFC2231 support for continuations and character sets
    assert_eq!(
        nested_attachment
            .get_header()
            .unwrap()
            .get_content_type()
            .unwrap()
            .get_attribute("name")
            .unwrap(),
        "Book about ☕ tables.gif"
    );

    // Integrates with Serde
    println!("{}", serde_json::to_string_pretty(&message).unwrap());
    println!("{}", serde_yaml::to_string(&message).unwrap());

Testing, Fuzzing & Benchmarking

To run the testsuite:

 $ cargo test --all-features

or, to run the testsuite with MIRI:

 $ cargo +nightly miri test --all-features

To fuzz the library with cargo-fuzz:

 $ cargo +nightly fuzz run mail_parser

and, to run the benchmarks:

 $ cargo +nightly bench --all-features

Conformed RFCs

Supported Character Sets

  • UTF-8
  • UTF-16, UTF-16BE, UTF-16LE
  • UTF-7
  • US-ASCII
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • ISO-8859-16
  • CP1250
  • CP1251
  • CP1252
  • CP1253
  • CP1254
  • CP1255
  • CP1256
  • CP1257
  • CP1258
  • KOI8-R
  • KOI8_U
  • MACINTOSH
  • IBM850
  • TIS-620

Supported character sets via the optional dependency encoding_rs:

  • SHIFT_JIS
  • BIG5
  • EUC-JP
  • EUC-KR
  • GB18030
  • GBK
  • ISO-2022-JP
  • WINDOWS-874
  • IBM-866

License

Licensed under either of

at your option.

Copyright

Copyright (C) 2020-2022, Stalwart Labs, Minter Ltd.

See COPYING for the license.

Comments
  • Support for malformed unstructured fields containing encoded words

    Support for malformed unstructured fields containing encoded words

    I've come across a number of emails where the subject, which contains encoded words, was modified by the recipients' mail server such that the final subject became something like:

    [SUSPECTED SPAM]=?utf-8?B?VGhpcyBpcyB0aGUgb3JpZ2luYWwgc3ViamVjdA==?=
    

    I understand this does not get decoded as it is missing a space before the encoded word as required in the spec

    Ordinary ASCII text and 'encoded-word's may appear together in the same header field. However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'.

    Would it be possible to add support for parsing these types of malformed fields, seeing as mail servers which do this are relatively common?

    opened by BryanLeong 10
  • MessagePart::Message as u8 slice

    MessagePart::Message as u8 slice

    I would like to be able to get MessagePart::Message as something like an u8 slice. My use case is: save the attached message with the orignal file name as an eml file. Actually I can only get the parts of the attached message.

    Thanks!

    opened by bogct0mculhl 10
  • Retrieving headers in message order

    Retrieving headers in message order

    I am attempting to use this library to create a new message, but I want to keep all the existing headers in the same order they were present in the original message so that things like Received headers don't move around compared to the rest of the headers.

    Is this a use case that was considered for this library?

    I am not sure the best way to build it since right now it means iterating over all of the RawHeaders and sorting them by their offsets.

    opened by bertjwregeer 7
  • API guidelines: avoid get_() prefixes

    API guidelines: avoid get_() prefixes

    The API guidelines recommend avoiding get_() prefixes:

    https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter

    These seem to be very common in the mail-parser API, and it sticks out as a little unidiomatic. Maybe consider renaming?

    opened by djc 6
  • Message::parse() panics!

    Message::parse() panics!

    Hi guys,

    Great little library!! But...

    This example:

    use mail_parser::Message;
    
    const INPUT: &[u8; 314] = br#"From: <[email protected]>
    Subject: Redacted
    To: <[email protected]>
    Message-ID: <[email protected]>
    MIME-version: 1.0
    Content-Type: text/html;
     charset=utf-8
    Content-Disposition: =?utf-8?Q?invalid?=
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: =?utf-8?Q?invalid?=
    
    <p>foo</p>"#;
    
    fn main() {
        let message = Message::parse(INPUT);
        dbg!(message);
    }
    

    Will panic with this message:

    thread 'main' panicked at 'HeaderValue::get_content_type called on non-ContentType value', /home/arif/.cargo/registry/src/github.com-1ecc6299db9ec823/mail-parser-0.4.4/src/lib.rs:693:18
    

    despite Message::parse advertising it never will; and that it does a best effort.

    Could we have a look into it? Thanks!

    opened by arifd 6
  • Make Datetime sortable?

    Make Datetime sortable?

    I'm trying to sort a list of emails by datetime, and it appears the DateTime struct doesn't allow this -- what do you think the best solution is for this?

    opened by alexwennerberg 5
  • Parsing mbox files?

    Parsing mbox files?

    The library is great, thanks for putting it together! I was wondering if there are plans to have a parser for reading&parsing files in mbox format?

    opened by onthebridgetonowhere 5
  • What represents MessagePart::Multipart

    What represents MessagePart::Multipart

    I noticed that in your most recent release (0.3.1) you added a variant of Messagepart: Multipart and I was wondering how I should handle this when rendering attachments. Is there a test case for this?

    opened by arjentz 5
  • MHTML Snapshot of Google Fails To Parse CSS Attachments

    MHTML Snapshot of Google Fails To Parse CSS Attachments

    Hello, I have an issue with MHTML parsing from snapshots taken with the Chrome API.

    Given the following file found here:

    https://gist.github.com/quesurifn/f6d5c7068a916b9f46927a01fd87ed36

    When iterating through attachments and parsing each one, the first attachment to be passed through to for_each is the css. E.g.

    "sM5MNb" aria-live=3D"polite" class=3D"SaJ9Qe"></div></div></body></html>
    ------MultipartBoundary--iWpBqfSwVqslpqB6usfCGkX1ZOR2Vavef3mICKj83Y----
    Content-Type: text/css
    Content-Transfer-Encoding: quoted-printable
    Content-Location: cid:[email protected]
    
    @charset "utf-8";
    
    html, body, h1, input, select { font-family: arial, sans-serif; }
    
    body, h1 { font-size: 14px; }
    
    h1 { font-weight: normal; margin: 0px; padding: 0px; }
    
    h3 { font-weight: normal; margin: 0px; padding: 0px; font-size: 20px; line-=
    height: 1.3; }
    
    body { margin: 0px; background: rgb(255, 255, 255); color: rgb(32, 33, 36);=
     }
    

    Given the following code:

       let path = fs::canonicalize(Path::new("./src/file.txt")).unwrap();
       let foo = fs::read_to_string(path).unwrap_or_default();
       let message = Message::parse(foo.as_bytes()).unwrap();
       let mut body = message.get_html_body(0).unwrap_or_default();
       message.get_html_bodies().for_each(|f| {
           let id = f.get_content_id().unwrap();
           println!("{}", id)
       });
       message.parts.iter().for_each(|f| {
          >>> let m  = f.parse_message().unwrap(); <--- Panics here
           let id = m.get_content_id().unwrap();
           println!("{}", id)
       });
    

    With the following stack trace:

    thread 'main' panicked at 'Expected message part.', /Users/kylefahey/.cargo/registry/src/github.com-1ecc6299db9ec823/mail-parser-0.4.8/src/lib.rs:436:18
    stack backtrace:
       0: std::panicking::begin_panic
                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:616:12
       1: mail_parser::MessagePart::parse_message
                 at /Users/kylefahey/.cargo/registry/src/github.com-1ecc6299db9ec823/mail-parser-0.4.8/src/lib.rs:436:18
       2: sandbox::main::{{closure}}
                 at ./src/main.rs:17:18
       3: <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::for_each
                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/slice/iter/macros.rs:211:21
       4: sandbox::main
                 at ./src/main.rs:16:5
       5: core::ops::function::FnOnce::call_once
                 at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
    
    opened by quesurifn 3
  • Feature to not use `unsafe`

    Feature to not use `unsafe`

    I would gladly accept a performance penalty in exchange for not using unsafe code. If there were a feature on this crate to allow that tradeoff, I would use it. As it stands, I cannot use this library because of the use of unsafe.

    enhancement 
    opened by dcormier 3
  • WIP: Add information to use mail-parser in an IMAP server

    WIP: Add information to use mail-parser in an IMAP server

    It seems that my previous patch was not totally correct (for example, it does not work with DecodeResult::Borrowed). So I have rewritten it to to use only stream.pos and state.mime_boundary.as_ref(). But I am not totally satisfied yet: the specific problem is when there is a RFC822 message contained in an RFC5322 message, mail-parser wrongly include the MIME delimiter from the parent in the child, RFC822 email. I want to fix it and validate all this logic on your test emails.

    Retrospectively, I am not sure that offset_last_part is really needed, but having a body_raw in Part is required: an IMAP server must not decode content and return its raw size, not its decoded size.

    Also, I am a bit worried with my new fields: it might break your tests, so we need to check that and maybe update them. I will not be available in the next weeks but hope to finish this work until the end of the summer.

    In the end, I think you should take into account this information if you want to make a release during this time: my previous PR might not be that great. If you really need to make a release, I would recommend that you rollback it.

    Sorry, I thought that the problem might be easier to address.

    opened by superboum 2
  • Inlining images

    Inlining images

    First of all, thanks a lot for this library and the great amount of work that went into it🙏🏻

    As far as i can tell images that are appended as inline attachments are not automatically inlined in the resulting body HTML string.

    For example, this element is in an email by Confluence:

    <img style=3D"vertical-align: top; display: block;" src=3D"cid:page-icon" alt=3D"page icon" title=3D="page icon" height=3D"16" width=3D"16" border=3D"0">
    

    This references an attachment, page-icon, by Content-ID which is appended later in the email body:

    ------=_Part_565_1697167297.1654852801463
    Content-Type: image/png; name=page-icon.png
    Content-Transfer-Encoding: base64
    Content-ID: <page-icon>
    Content-Disposition: inline; filename=page-icon.png
    
    iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAAPFBMVEX///+1tbWwsLCtra3////5
    +fmLi4vZ2dnT09P8/PzPz8+rq6uhoaHR0dFycnJwcHB6enp4eHiDg4OAgIBog/vRAAAADnRSTlMA
    IiJV3e7u7u7u7u7u7rDOyYEAAABUSURBVHhepcpLDoAwCABRqkBbP9Dq/e9qLYS1ibN8GQBYWFVG
    fQWLWyFEJG0uknGmuz+CDnjYEzDqDpF8BrV+HBRHNThjyBP42qpBufmFxOIpJ3gAPTUGaYiilrsA
    AAAASUVORK5CYII=
    ------=_Part_565_1697167297.1654852801463
    

    Now when i get the HTML body of the email like this:

    message.get_html_body(0).unwrap()
    

    ...the image source still says cid:page-icon. Is there a way to automatically inline those attachments? Or is this something that should be handled by the user?

    Other libraries, such as the Nodemailer mailparser do this automatically.

    enhancement 
    opened by MaxGfeller 5
Releases(0.8.0)
Owner
Stalwart Labs
We build distributed applications in Rust.
Stalwart Labs
Rust library to parse mail files

mailparse A simple parser for MIME email messages. API The primary entry point for this library is the following function: parse_mail(&[u8]) -> Re

Kartikaya Gupta (kats) 150 Dec 27, 2022
E-mail delivery library for Rust with DKIM support

mail-send mail-send is a Rust library to build, sign and send e-mail messages via SMTP. It includes the following features: Generates e-mail messages

Stalwart Labs 165 Oct 23, 2023
A mail suite written in rust meant to be easy to use.

Erooster A mail suite written in rust meant to be easy to use. Getting started Currently the setup is quite rough. You need some certificates for your

Marcel 33 Dec 19, 2022
Unofficial Rust library for the SendGrid API

sendgrid-rs Unofficial Rust library for the SendGrid API. This crate requires Rust 1.15 or higher as it uses a crate that has a custom derive implemen

Garrett Squire 88 Dec 27, 2022
a mailer library for Rust

lettre A mailer library for Rust NOTE: this readme refers to the 0.10 version of lettre, which is still being worked on. The master branch and the alp

lettre 1.3k Jan 4, 2023
An ESMTP server library written in Rust.

rs-smtp An ESMTP server library written in Rust. Features ESMTP client & server implementing RFC 5321 Support for SMTP AUTH and PIPELINING UTF-8 suppo

DUNEF 3 Apr 15, 2023
A small unofficial library to send emails using Sendgrid.

sendgrid_thin A thin wrapper around the SendGrid V3 API. It does not use the crate tokio or hyper and is therefore very lightweight and do not interfe

Reinaldo Rozato Junior 3 Nov 17, 2022
This app reads a csv file and sends an email with a formatted Handlebars file.

Bulkmail This app reads a csv file and sends an email with a formatted Handlebars file. This can be run on Linux for AMD64 and ARMv7. Upstream Links D

Giovanni Bassi 17 Nov 3, 2022
Visually cluster your emails by sender, domain, and more to identify waste

Postsack A high level visual overview of swaths of email TLDR! A web demo that shows how Postsack clusters a set of 10.000 fake emails Do you have man

Benedikt Terhechte 298 Dec 26, 2022
新しい IMAP client in Rust

新しい IMAP client 新しい (atarashii/new) IMAP client in Rust. It supports plain and secure connections. In progress It's under development... Usage Put thi

Alex Maslakov 39 Sep 13, 2020
Implementation of mjml in rust

MRML Introduction This project is a reimplementation of the nice MJML markup language in Rust. How to use it use mrml; fn main() { match mrml::to

Jérémie Drouet 228 Dec 28, 2022
Rust implementation of catapulte email sender

Catapulte What is catapulte? Catapulte is an open source mailer you can host yourself. You can use it to quickly catapult your transactionnal emails t

Jérémie Drouet 108 Dec 14, 2022
📫Himalaya: CLI email client written in Rust.

??Himalaya: CLI email client written in Rust.

Clément DOUIN 2.1k Jan 7, 2023
Check if an email address exists without sending any email, written in Rust.

Check if an email address exists without sending any email, written in Rust.

Reacher 3.5k Dec 31, 2022
A rewrite of the server side parts of emersion/go-smtp package into rust.

rust-smtp-server A rust smtp server library. It's mainly a rewrite of the server side parts of the emersion/go-smtp library. Features Usage Add this t

Nick Westendorf 3 Apr 26, 2023
Fast and robust e-mail parsing library for Rust

mail-parser mail-parser is an e-mail parsing library written in Rust that fully conforms to the Internet Message Format standard (RFC 5322), the Multi

Stalwart Labs 158 Jan 1, 2023
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
Rust library to parse mail files

mailparse A simple parser for MIME email messages. API The primary entry point for this library is the following function: parse_mail(&[u8]) -> Re

Kartikaya Gupta (kats) 150 Dec 27, 2022
E-mail delivery library for Rust with DKIM support

mail-send mail-send is a Rust library to build, sign and send e-mail messages via SMTP. It includes the following features: Generates e-mail messages

Stalwart Labs 165 Oct 23, 2023
A static mail HTML archive for the 21st century, written in Rust

?? Crabmail ?? self-hosted / github mirror A static mail HTML archive for the 21st century, written in Rust. Includes helpful "modern" features that e

Alex Wennerberg 18 Oct 11, 2022