Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

Overview

SyntaxDot

Introduction

SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, SyntaxDot can be used to perform any sequence labeling task, but so far the focus has been on:

  • Part-of-speech tagging
  • Morphological tagging
  • Topological field tagging
  • Lemmatization
  • Named entity recognition

The easiest way to get started with SyntaxDot is to use a pretrained sticker2 model (SyntaxDot is currently compatbile with sticker2 models).

Features

  • Input representations:
    • Word pieces
    • Sentence pieces
  • Flexible sequence encoder/decoder architecture, which supports:
    • Simple sequence labels (e.g. POS, morphology, named entities)
    • Lemmatization, based on edit trees
    • Simple API to extend to other tasks
    • Dependency parsing as sequence labeling
  • Dependency parsing using deep biaffine attention and MST decoding.
  • Multi-task training and classification using scalar weighting.
  • Encoder models:
    • Transformers
    • Finetuning of BERT, XLM-RoBERTa, ALBERT, and SqueezeBERT models
  • Model distillation
  • Deployment:
    • Standalone binary that links against PyTorch's libtorch
    • Very liberal license

Documentation

References

SyntaxDot uses techniques from or was inspired by the following papers:

Issues

You can report bugs and feature requests in the SyntaxDot issue tracker.

License

For licensing information, see COPYRIGHT.md.

Comments
  • Internal torch error: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend

    Internal torch error: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend

    Hi, I am trying to train the model. I have tried libtorch1.9.0 for CUDA 10.2 and 11.1, neither of them seem to work.

    Error: Cannot construct model

    Caused by: Internal torch error: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, Meta, MkldnnCPU, SparseCPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

    CPU: registered at aten/src/ATen/RegisterCPU.cpp:16286 [kernel]
    Meta: registered at aten/src/ATen/RegisterMeta.cpp:9460 [kernel]
    MkldnnCPU: registered at aten/src/ATen/RegisterMkldnnCPU.cpp:563 [kernel]
    SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:959 [kernel]
    BackendSelect: registered at aten/src/ATen/RegisterBackendSelect.cpp:609 [kernel]
    Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
    ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
    AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:9226 [autograd kernel]
    Tracer: registered at ../torch/csrc/autograd/generated/TraceType_4.cpp:9909 [kernel]
    Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:255 [backend fallback]
    Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1019 [backend fallback]
    VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
    
    Exception raised from reportError at ../aten/src/ATen/core/dispatch/OperatorEntry.cpp:399 (most recent call first):
    frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f430d9401d9 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libc10.so)
    frame #1: <unknown function> + 0xf15b20 (0x7f430eaa2b20 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #2: c10::impl::OperatorEntry::reportError(c10::DispatchKey) const + 0x863 (0x7f430eb1f393 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #3: <unknown function> + 0x1adef73 (0x7f430f66bf73 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #4: at::empty(c10::ArrayRef<long>, c10::TensorOptions, c10::optional<c10::MemoryFormat>) + 0x293 (0x7f430f241a13 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #5: at::native::randn(c10::ArrayRef<long>, c10::optional<at::Generator>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x52d (0x7f430eecd39d in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #6: at::native::randn(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x51 (0x7f430eecd4f1 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #7: <unknown function> + 0x1cc0706 (0x7f430f84d706 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #8: <unknown function> + 0x1ad94cd (0x7f430f6664cd in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #9: <unknown function> + 0x1ae2b5b (0x7f430f66fb5b in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #10: at::randn(c10::ArrayRef<long>, c10::TensorOptions) + 0x298 (0x7f430f2a3b48 in /content/drive/MyDrive/SyntaxDot/libtorch/lib/libtorch_cpu.so)
    frame #11: <unknown function> + 0x2de84b (0x55f11554784b in /root/.cargo/bin/syntaxdot)
    frame #12: <unknown function> + 0x2c6553 (0x55f11552f553 in /root/.cargo/bin/syntaxdot)
    frame #13: <unknown function> + 0x2ca34e (0x55f11553334e in /root/.cargo/bin/syntaxdot)
    frame #14: <unknown function> + 0x2cbb6b (0x55f115534b6b in /root/.cargo/bin/syntaxdot)
    frame #15: <unknown function> + 0x2c1bca (0x55f11552abca in /root/.cargo/bin/syntaxdot)
    frame #16: <unknown function> + 0xe8785 (0x55f115351785 in /root/.cargo/bin/syntaxdot)
    frame #17: <unknown function> + 0x1728e0 (0x55f1153db8e0 in /root/.cargo/bin/syntaxdot)
    frame #18: <unknown function> + 0x17b961 (0x55f1153e4961 in /root/.cargo/bin/syntaxdot)
    frame #19: <unknown function> + 0x15e05c (0x55f1153c705c in /root/.cargo/bin/syntaxdot)
    frame #20: <unknown function> + 0xd6781 (0x55f11533f781 in /root/.cargo/bin/syntaxdot)
    frame #21: <unknown function> + 0xc60db (0x55f11532f0db in /root/.cargo/bin/syntaxdot)
    frame #22: <unknown function> + 0xe8f73 (0x55f115351f73 in /root/.cargo/bin/syntaxdot)
    frame #23: <unknown function> + 0xe8f8d (0x55f115351f8d in /root/.cargo/bin/syntaxdot)
    frame #24: <unknown function> + 0x3e39ea (0x55f11564c9ea in /root/.cargo/bin/syntaxdot)
    frame #25: <unknown function> + 0xc6952 (0x55f11532f952 in /root/.cargo/bin/syntaxdot)
    frame #26: __libc_start_main + 0xe7 (0x7f430c714bf7 in /lib/x86_64-linux-gnu/libc.so.6)
    frame #27: <unknown function> + 0x7e7fa (0x55f1152e77fa in /root/.cargo/bin/syntaxdot)
    
    opened by ASB1993 9
  • Internal torch error: Cuda error: no kernel image is available for execution on the device

    Internal torch error: Cuda error: no kernel image is available for execution on the device

    Hi,

    during finetuning I did:

    syntaxdot finetune syntaxdot.conf bert-base-german-syntaxdot.pt tuebadz-conllu-new.conllu tuebadz-dev.conllu --gpu 0 --label-smoothing 0.03 --maxlen 100 --warmup 10000
    
    

    which throws the error:

    Error: Cannot construct model

    Caused by: Internal torch error: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from distribution_nullary_kernel at /pytorch/aten/src/ATen/native/cuda/DistributionTemplates.h:158 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7fda869591d9 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libc10.so) frame #1: + 0x100c380 (0x7fdb2fd12380 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so) frame #2: void at::native::(anonymous namespace)::distribution_nullary_kernel<float, float, 4, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::TensorIteratorBase&, at::CUDAGeneratorImpl, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>), &(void at::native::templates::cuda::normal_and_transform<float, float, 4ul, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float> >(at::TensorIteratorBase&, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>)), 2u>>, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float> >(at::TensorIteratorBase&, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::TensorIteratorBase&, at::CUDAGeneratorImpl, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>), &(void at::native::templates::cuda::normal_and_transform<float, float, 4ul, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float> >(at::TensorIteratorBase&, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>)), 2u>> const&, __nv_dl_wrapper_t<nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>) + 0x9ae (0x7fdb306d296e in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so) frame #3: void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*) + 0x311 (0x7fdb306d42b1 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so) frame #4: at::native::normal_kernel(at::Tensor&, double, double, c10::optionalat::Generator) + 0xbf (0x7fdb306d013f in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so) frame #5: + 0x111667e (0x7fda87cbc67e in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #6: at::native::normal(at::Tensor&, double, double, c10::optionalat::Generator) + 0x39 (0x7fda87caf7b9 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #7: + 0x2d66c53 (0x7fdb31a6cc53 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so) frame #8: + 0x2d66d45 (0x7fdb31a6cd45 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so) frame #9: at::Tensor::normal(double, double, c10::optionalat::Generator) const + 0x180 (0x7fda88a991b0 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #10: at::native::randn(c10::ArrayRef, c10::optionalat::Generator, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x56b (0x7fda87ee63db in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #11: at::native::randn(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x51 (0x7fda87ee64f1 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #12: + 0x1cc0706 (0x7fda88866706 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #13: + 0x1ad94cd (0x7fda8867f4cd in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #14: + 0x1ae2b5b (0x7fda88688b5b in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #15: at::randn(c10::ArrayRef, c10::TensorOptions) + 0x298 (0x7fda882bcb48 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so) frame #16: + 0x2c9da0 (0x563b611a6da0 in syntaxdot) frame #17: + 0x2b6827 (0x563b61193827 in syntaxdot) frame #18: + 0x2b6365 (0x563b61193365 in syntaxdot) frame #19: + 0x2ad5ba (0x563b6118a5ba in syntaxdot) frame #20: + 0x132aac (0x563b6100faac in syntaxdot) frame #21: + 0x17d25e (0x563b6105a25e in syntaxdot) frame #22: + 0x18576d (0x563b6106276d in syntaxdot) frame #23: + 0x15b775 (0x563b61038775 in syntaxdot) frame #24: + 0xf3b21 (0x563b60fd0b21 in syntaxdot) frame #25: + 0x18995f (0x563b6106695f in syntaxdot) frame #26: + 0x18c523 (0x563b61069523 in syntaxdot) frame #27: + 0xe8afd (0x563b60fc5afd in syntaxdot) frame #28: + 0x3bceca (0x563b61299eca in syntaxdot) frame #29: + 0x189e42 (0x563b61066e42 in syntaxdot) frame #30: __libc_start_main + 0xf3 (0x7fda8656b0b3 in /lib/x86_64-linux-gnu/libc.so.6) frame #31: + 0x9be1e (0x563b60f78e1e in syntaxdot)

    I have CUDA on my Ubuntu 20.04 (literally reinstalled everything at least 20 times). Nvcc -V shows: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Tue_Sep_15_19:10:02_PDT_2020 Cuda compilation tools, release 11.1, V11.1.74 Build cuda_11.1.TC455_06.29069683_0

    I need a 11.1 build for libtorch, so I really need the compiler to be 11.1. Nvidia-smi shows Cuda version 11.4 (and I cannot change it).

    I added cuda to PATH as described in the docs, I also added Libtorch to PATH and LD_LIBRARY_PATH as suggested in the doc/install.

    Don't know what to do. Anyone here to help?

    opened by ASB1993 4
  • libtorch with args

    libtorch with args "c++" did not execute successfully

    Hey guys,

    I am trying to build SyntaxDot on Windows Computer. I already set up WSL, and followed the instructions on the installation page. I added the environment variable and Path for rustup manually. Nevertheless, when running "nix-env
    -f https://github.com/tensordot/syntaxdot/archive/main.tar.gz
    -iA packages.x86_64-linux.syntaxdot"

    it compiles everything, and in the end it throws the following error:

    error occurred: Command "/nix/store/35pnk5kwi26m3ph2bc7dxwjnavpzl8cn-gcc-wrapper-10.3.0/bin/c++" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m64" "-I" "/nix/store/0ripbpgjagf2yfqgzy0p11xx2ph7sjnl-torch-join/include" "-I" "/nix/store/0ripbpgjagf2yfqgzy0p11xx2ph7sjnl-torch-join/include/torch/csrc/api/include" "-Wl,-rpath=/nix/store/0ripbpgjagf2yfqgzy0p11xx2ph7sjnl-torch-join/lib" "-std=c++14" "-D_GLIBCXX_USE_CXX11_ABI=1" "-o" "/tmp/nix-build-syntaxdot-0.4.0.drv-0/syntaxdot/target/x86_64-unknown-linux-gnu/release/build/torch-sys-f8c64a951705a779/out/libtch/torch_api.o" "-c" "libtch/torch_api.cpp" with args "c++" did not execute successfully (status code exit code: 1).
    

    What could I possibly do to change it?

    Thanks in advance!

    opened by ASB1993 2
  • error while loading shared libraries

    error while loading shared libraries

    Hi, so I tried to prepare the label files as described in https://github.com/tensordot/syntaxdot/blob/main/doc/finetune.md with the command syntaxdot prepare /mnt/c/Users/bartl/Downloads/syntaxdot.conf /mnt/c/Users/bartl/Downloads/train.txt. However, it throws an error, namely: "syntaxdot: error while loading shared libraries: libtorch_cpu.so: cannot open shared object file: No such file or directory". I checked whether pytorch was installed correctly by running python -c "import torch; print(torch.eye(3))" and it gives me output. What could be wrong here?

    opened by ASB1993 1
  • Bump ohnomore from 0.3.0 to 0.4.0

    Bump ohnomore from 0.3.0 to 0.4.0

    Bumps ohnomore from 0.3.0 to 0.4.0.

    Commits
    • 686e5eb Bump version to 0.4.0
    • f3cf421 Update to udgraph/conllu 0.7
    • af1ed69 Update dependencies within semver
    • 4de9e4f Update to petgraph 0.6
    • 91d7b49 Relicense under Apache License version 2 or MIT License
    • f9d7ccc Bump thiserror from 1.0.16 to 1.0.24
    • 5f123cb Bump petgraph from 0.5.0 to 0.5.1
    • b92ec85 Bump fst from 0.4.3 to 0.4.6
    • c0aa76c Bump unicode-normalization from 0.1.12 to 0.1.17
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump conllu from 0.6.0 to 0.7.0

    Bump conllu from 0.6.0 to 0.7.0

    Bumps conllu from 0.6.0 to 0.7.0.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump udgraph from 0.6.0 to 0.7.0

    Bump udgraph from 0.6.0 to 0.7.0

    Bumps udgraph from 0.6.0 to 0.7.0.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump ordered-float from 2.6.0 to 2.7.0

    Bump ordered-float from 2.6.0 to 2.7.0

    Bumps ordered-float from 2.6.0 to 2.7.0.

    Release notes

    Sourced from ordered-float's releases.

    v2.7.0

    • New optional dependency proptest (#94).
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump wordpieces from 0.4.1 to 0.5.0

    Bump wordpieces from 0.4.1 to 0.5.0

    Bumps wordpieces from 0.4.1 to 0.5.0.

    Commits
    • 4defbf3 Bump version to 0.5.0 after license change
    • 7b888e4 Relicense under Apache License version 2 or MIT License
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump ordered-float from 2.5.1 to 2.6.0

    Bump ordered-float from 2.5.1 to 2.6.0

    Bumps ordered-float from 2.5.1 to 2.6.0.

    Release notes

    Sourced from ordered-float's releases.

    v2.6.0

    • Implement Signed for OrderedFloat (#93).
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • Bump thiserror from 1.0.25 to 1.0.26

    Bump thiserror from 1.0.25 to 1.0.26

    Bumps thiserror from 1.0.25 to 1.0.26.

    Release notes

    Sourced from thiserror's releases.

    1.0.26

    Commits
    • 031fea6 Release 1.0.26
    • 245e7cf Suppress nonstandard_macro_braces in generated code
    • 4bbe3ec Ignore buggy nonstandard_macro_braces clippy lint
    • e0628be Ignore doc_markdown clippy false positive
    • a37b5ab Resolve needless_borrow clippy lints
    • 8862629 Delete broken #[deprecated] test
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 1
  • cannot use syntaxdot for symbol lookup error

    cannot use syntaxdot for symbol lookup error

    Hello, I installed Syntaxdot as it was described in documentation, no error occurred, but when I tried to execute the syntaxdot binary, I got this error:

    syntaxdot: symbol lookup error: syntaxdot: undefined symbol: _ZN2at3mulERKNS_6TensorERKN3c106ScalarE

    I am using Ubuntu 22.04 on VirtualBox Thanks for your help, Ondra

    opened by tondach01 7
  • sentencepiece when running cargo install

    sentencepiece when running cargo install

    Hello,

    I'm having a look if I can use this library in order to later on build an R wrapper around it, as this setup seems to be the only software providing some functionalities similar to UDPipe 2.

    I'm new to rust however and although I've built some R wrappers around c++ libraries (namely UDPipe (in casu this on https://github.com/bnosac/udpipe) and sentencepiece (in casu this one: https://github.com/bnosac/sentencepiece), I don't know how to fix this sentencepiece build error. Could you indicate what goes wrong and how to solve this here?

    $ cargo install --no-default-features --path syntaxdot-cli
      Installing syntaxdot-cli v0.4.0 (C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\syntaxdot-cli)
        Updating crates.io index
       Compiling sentencepiece-sys v0.7.1
       Compiling torch-sys v0.5.0
    error: failed to run custom build command for `sentencepiece-sys v0.7.1`
    
    Caused by:
      process didn't exit successfully: `C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-0ade1f02ad77b1a5\build-script-build` (exit code: 101)
      --- stdout
      cargo:rerun-if-env-changed=SENTENCEPIECE_NO_PKG_CONFIG
      cargo:rerun-if-env-changed=PKG_CONFIG
      cargo:rerun-if-env-changed=SENTENCEPIECE_STATIC
      cargo:rerun-if-env-changed=SENTENCEPIECE_DYNAMIC
      cargo:rerun-if-env-changed=PKG_CONFIG_ALL_STATIC
      cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
      cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64-pc-windows-msvc
      cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64_pc_windows_msvc
      cargo:rerun-if-env-changed=HOST_PKG_CONFIG_PATH
      cargo:rerun-if-env-changed=PKG_CONFIG_PATH
      cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64-pc-windows-msvc
      cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64_pc_windows_msvc
      cargo:rerun-if-env-changed=HOST_PKG_CONFIG_LIBDIR
      cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR
      cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64-pc-windows-msvc
      cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64_pc_windows_msvc
      cargo:rerun-if-env-changed=HOST_PKG_CONFIG_SYSROOT_DIR
      cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
      running: "cmake" "C:\\Users\\Jan\\.cargo\\registry\\src\\github.com-1ecc6299db9ec823\\sentencepiece-sys-0.7.1\\source" "-G" "Visual Studio 16 2019" "-Thost=x64" "-Ax64" "-DCMAKE_INSTALL_PREFIX=C:\\Users\\Jan\\Dropbox\\Work\\RForgeBNOSAC\\OpenSource\\syntaxdot\\target\\release\\build\\sentencepiece-sys-2d2a243b21575a42\\out" "-DCMAKE_C_FLAGS= -nologo -MD -Brepro" "-DCMAKE_C_FLAGS_RELEASE= -nologo -MD -Brepro" "-DCMAKE_CXX_FLAGS= -nologo -MD -Brepro" "-DCMAKE_CXX_FLAGS_RELEASE= -nologo -MD -Brepro" "-DCMAKE_ASM_FLAGS= -nologo -MD -Brepro" "-DCMAKE_ASM_FLAGS_RELEASE= -nologo -MD -Brepro" "-DCMAKE_BUILD_TYPE=Release"
      -- VERSION: 0.1.96
      -- Selecting Windows SDK version 10.0.18362.0 to target Windows 6.3.9600.
      -- Not Found TCMalloc: TCMALLOC_LIB-NOTFOUND
      -- Configuring done
      -- Generating done
      -- Build files have been written to: C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/OpenSource/syntaxdot/target/release/build/sentencepiece-sys-2d2a243b21575a42/out/build
      running: "cmake" "--build" "." "--target" "install" "--config" "Release" "--"
      Microsoft (R) Build Engine version 16.6.0+5ff7b0c9e for .NET Framework
      Copyright (C) Microsoft Corporation. All rights reserved.
    
        Auto build dll exports
        sentencepiece.vcxproj -> C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\sentencepiece.dll
        sentencepiece-static.vcxproj -> C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\sentencepiece.lib
        Auto build dll exports
           Creating library C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/OpenSource/syntaxdot/target/release/build/sentencepiece-sys-2d2a243b21575a42/out/build/src/Release/sentencepiece_train_import.lib and object C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/OpenSource/syntaxdot/target/release/build/sentencepiece-sys-2d2a243b21575a42/out/build/src/Release/sentencepiece_train_import.exp
      trainer_interface.obj : error LNK2019: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_unk_piece_" (?_i_give_permission_to_break_this_code_default_unk_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) referenced in function "public: bool __cdecl <lambda_00046828aa1a5cfb8c470ee6e720106a>::operator()(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,enum sentencepiece::ModelProto_SentencePiece_Type)const " (??R<lambda_00046828aa1a5cfb8c470ee6e720106a>@@QEBA_NAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@W4ModelProto_SentencePiece_Type@sentencepiece@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2001: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_unk_piece_" (?_i_give_permission_to_break_this_code_default_unk_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      trainer_interface.obj : error LNK2019: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_bos_piece_" (?_i_give_permission_to_break_this_code_default_bos_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) referenced in function "public: bool __cdecl <lambda_00046828aa1a5cfb8c470ee6e720106a>::operator()(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,enum sentencepiece::ModelProto_SentencePiece_Type)const " (??R<lambda_00046828aa1a5cfb8c470ee6e720106a>@@QEBA_NAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@W4ModelProto_SentencePiece_Type@sentencepiece@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2001: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_bos_piece_" (?_i_give_permission_to_break_this_code_default_bos_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      trainer_interface.obj : error LNK2019: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_eos_piece_" (?_i_give_permission_to_break_this_code_default_eos_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) referenced in function "public: bool __cdecl <lambda_00046828aa1a5cfb8c470ee6e720106a>::operator()(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,enum sentencepiece::ModelProto_SentencePiece_Type)const " (??R<lambda_00046828aa1a5cfb8c470ee6e720106a>@@QEBA_NAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@W4ModelProto_SentencePiece_Type@sentencepiece@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2001: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_eos_piece_" (?_i_give_permission_to_break_this_code_default_eos_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      trainer_interface.obj : error LNK2019: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_pad_piece_" (?_i_give_permission_to_break_this_code_default_pad_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) referenced in function "public: bool __cdecl <lambda_00046828aa1a5cfb8c470ee6e720106a>::operator()(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,enum sentencepiece::ModelProto_SentencePiece_Type)const " (??R<lambda_00046828aa1a5cfb8c470ee6e720106a>@@QEBA_NAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@W4ModelProto_SentencePiece_Type@sentencepiece@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2001: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_pad_piece_" (?_i_give_permission_to_break_this_code_default_pad_piece_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      trainer_interface.obj : error LNK2019: unresolved external symbol "const sentencepiece::ModelProto::`vftable'" (??_7ModelProto@sentencepiece@@6B@) referenced in function "private: class sentencepiece::util::Status __cdecl sentencepiece::TrainerInterface::SaveModel(class absl::string_view)const " (?SaveModel@TrainerInterface@sentencepiece@@AEBA?AVStatus@util@2@Vstring_view@absl@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      unigram_model_trainer.obj : error LNK2001: unresolved external symbol "const sentencepiece::ModelProto::`vftable'" (??_7ModelProto@sentencepiece@@6B@) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2001: unresolved external symbol "const sentencepiece::ModelProto::`vftable'" (??_7ModelProto@sentencepiece@@6B@) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      unigram_model_trainer.obj : error LNK2019: unresolved external symbol "class sentencepiece::TrainerSpecDefaultTypeInternal sentencepiece::_TrainerSpec_default_instance_" (?_TrainerSpec_default_instance_@sentencepiece@@3VTrainerSpecDefaultTypeInternal@1@A) referenced in function "public: virtual bool __cdecl sentencepiece::ModelInterface::ByteFallbackEnabled(void)const " (?ByteFallbackEnabled@ModelInterface@sentencepiece@@UEBA_NXZ) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2019: unresolved external symbol "private: static class google::protobuf::internal::LazyString const sentencepiece::TrainerSpec::_i_give_permission_to_break_this_code_default_unk_surface_" (?_i_give_permission_to_break_this_code_default_unk_surface_@TrainerSpec@sentencepiece@@0VLazyString@internal@protobuf@google@@B) referenced in function "class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > __cdecl sentencepiece::PrintProto(class sentencepiece::TrainerSpec const &,class absl::string_view)" (?PrintProto@sentencepiece@@YA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@AEBVTrainerSpec@1@Vstring_view@absl@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2019: unresolved external symbol "const sentencepiece::TrainerSpec::`vftable'" (??_7TrainerSpec@sentencepiece@@6B@) referenced in function "public: static class sentencepiece::util::Status __cdecl sentencepiece::SentencePieceTrainer::Train(class std::unordered_map<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct std::hash<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >,struct std::equal_to<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >,class
    std::allocator<struct std::pair<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const ,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > > const &,class sentencepiece::SentenceIterator *,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > *)" (?Train@SentencePieceTrainer@sentencepiece@@SA?AVStatus@util@2@AEBV?$unordered_map@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V12@U?$hash@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@U?$equal_to@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@2@V?$allocator@U?$pair@$$CBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@V12@@std@@@2@@std@@PEAVSentenceIterator@2@PEAV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@6@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      sentencepiece_trainer.obj : error LNK2019: unresolved external symbol "const sentencepiece::NormalizerSpec::`vftable'" (??_7NormalizerSpec@sentencepiece@@6B@) referenced in function "public: static class sentencepiece::NormalizerSpec __cdecl sentencepiece::SentencePieceTrainer::GetNormalizerSpec(class absl::string_view)" (?GetNormalizerSpec@SentencePieceTrainer@sentencepiece@@SA?AVNormalizerSpec@2@Vstring_view@absl@@@Z) [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
      C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\sentencepiece_train.dll : fatal error LNK1120: 9 unresolved externals [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\sentencepiece_train.vcxproj]
        sentencepiece_train-static.vcxproj -> C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\sentencepiece_train.lib
      spm_decode_main.obj : error LNK2019: unresolved external symbol "class absl::Flag<int> FLAGS_minloglevel" (?FLAGS_minloglevel@@3V?$Flag@H@absl@@A) referenced in function main [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_decode.vcxproj]
      spm_decode_main.obj : error LNK2019: unresolved external symbol "const sentencepiece::SentencePieceText::`vftable'" (??_7SentencePieceText@sentencepiece@@6B@) referenced in function main [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_decode.vcxproj]
      C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\spm_decode.exe : fatal error LNK1120: 2 unresolved externals [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_decode.vcxproj]
      spm_encode_main.obj : error LNK2019: unresolved external symbol "class absl::Flag<int> FLAGS_minloglevel" (?FLAGS_minloglevel@@3V?$Flag@H@absl@@A) referenced in function main [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_encode.vcxproj]
      spm_encode_main.obj : error LNK2019: unresolved external symbol "const sentencepiece::SentencePieceText::`vftable'" (??_7SentencePieceText@sentencepiece@@6B@) referenced in function main [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_encode.vcxproj]
      spm_encode_main.obj : error LNK2019: unresolved external symbol "const sentencepiece::NBestSentencePieceText::`vftable'" (??_7NBestSentencePieceText@sentencepiece@@6B@) referenced in function main [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_encode.vcxproj]
      C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\spm_encode.exe : fatal error LNK1120: 3 unresolved externals [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_encode.vcxproj]
      spm_export_vocab_main.obj : error LNK2019: unresolved external symbol "class absl::Flag<int> FLAGS_minloglevel" (?FLAGS_minloglevel@@3V?$Flag@H@absl@@A) referenced in function main [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_export_vocab.vcxproj]
      C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\Release\spm_export_vocab.exe : fatal error LNK1120: 1 unresolved externals [C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target\release\build\sentencepiece-sys-2d2a243b21575a42\out\build\src\spm_export_vocab.vcxproj]
    
      --- stderr
      thread 'main' panicked at '
      command did not execute successfully, got: exit code: 1
    
      build script failed, must exit now', C:\Users\Jan\.cargo\registry\src\github.com-1ecc6299db9ec823\cmake-0.1.45\src\lib.rs:894:5
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    warning: build failed, waiting for other jobs to finish...
    error: failed to compile `syntaxdot-cli v0.4.0 (C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\syntaxdot-cli)`, intermediate artifacts can be found at `C:\Users\Jan\Dropbox\Work\RForgeBNOSAC\OpenSource\syntaxdot\target`
    
    Caused by:
      build failed
    

    In my R wrapper around sentencepiece protobuf is included, I don't know the setup here.

    opened by jwijffels 1
Releases(0.4.1)
  • 0.4.1(Aug 16, 2021)

  • 0.4.0(Aug 15, 2021)

    Added

    • Add support for parallelizing annotation at the batch level. SyntaxDot has so far used PyTorch inter/intraop parallelization. This change adds support for parallelization at the batch level. Annotation-level parallelization can be configured with the annotation-threads command-line option of syntaxdot annotate.

    • Add ReLU (relu) as an option as the non-linearity in the feed-forward transformer layers. This is much faster for systems where no vectorized version of the normal distribution CDF is available (currently Apple M1).

    • The non-linearity that is used in the biaffine feed-forward layers is now configurable. For example:

      [biaffine]
      activation = "relu"
      

      When this option is absent, the GELU activation (gelu) will be used as the default.

    Changed

    • The license of SyntaxDot has changed from the Blue Oak Model License 1.0 to the MIT License or Apache License version 2.0 (at your option).

    • SyntaxDot now uses dynamic batch sizes. Before this change, the batch size (--batch-size) was specified as the number of sentences per batch. Since sentences are sorted by length before batching, annotation is performed on batches with roughly equisized sequences. However, later batches required more computations per batch due to longer sequence lengths.

      This change replaces the --batch-size option by the --max-batch-pieces option. This option specifies the number of word/sentence pieces that a batch should contain. SyntaxDot annotation creates batches that contains at most that number of pieces. The only exception are single sentences that are longer than the maximum number of batch pieces.

      With this change, annotating each batch is approximately the same amount of work. This leads to approximately 10% increase in performance.

      Since the batch size is not fixed anymore, the readahead (--readahead) is now specified in number of sentences.

    • Update to libtorch 1.9.0 and tch 0.5.0.

    • Change the default number of inter/intraop threads to 1. Use 4 threads for annotation-level parallelization. This has shown to be faster for all models, both on AMD Ryzen and Apple M1.

    Source code(tar.gz)
    Source code(zip)
    syntaxdot-0.4.0-cpu-x86_64-linux-gnu-gcc.tar.gz(117.99 MB)
  • 0.3.1(Jun 29, 2021)

    Fixed

    • Apply biaffine dependency encoding before sequence labeling, so that the TüBa-D/Z lemma decoder has access to dependency relations.
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Mar 22, 2021)

    You can also download ready-to-use models.

    Added

    • Support for biaffine dependency parsing (Dozat & Manning, 2016). Biaffine parsing is enabled through the biaffine configuration option.
    • Support for pooling the pieces of a token by taking the mean of the pieces. This type of pooling is enabled by setting the model.pooler option to mean. The old behavior of discarding continuation pieces is used when this option is set to discard.
    • Add the keep-best option to the finetune and distill subcommands. With this option only the parameter files for the N best epochs/steps are retained during distillation.
    • Support for hidden layer distillation loss. This loss uses the mean squared error of the teacher's hidden layer representations and student representations for faster convergence.

    Changed

    • Update to libtorch 1.8.0 and tch 0.4.0.
    • Pretrained models are now loaded from the libtorch OutputArchive format, rather than the HDF5 format. This removes HDF5 as a dependency.
    • Properly prefix embeddings with embeddings rather than encoder in BERT/RoBERTa models. Warning: This breaks compatibility with BERT and RoBERTa models from prior versions of SyntaxDot and sticker2, which should be retrained.
    • Implementations of Tokenizer are now required to put a piece that marks the beginning of a sentence before the first token piece. BertTokenizer was the only tokenizer that did not fulfill this requirement. BertTokenizer is updated to insert the [CLS] piece as a beginning of sentence marker. Warning: this breaks existing models with tokenizer = "bert", which should be retrained.
    • Replace calls to the Rust Torch crate (tch) by fallible counterparts, this makes exceptions thrown by Torch far easier to read.
    • Uses of the eprintln! macro are replaced by logging using log and env_logger. The verbosity of the logs can be controlled with the RUST_LOG environment variable (e.g. RUST_LOG=info).
    • Replace tfrecord by our own minimalist TensorBoard summary writing, removing 92 dependencies.

    Removed

    • Support for hard loss is removed from the distillation subcommand. Hard loss never worked well compared to soft loss.

    Fixed

    • Fix an off-by-one slicing error in SequenceClassifiers::top_k.
    Source code(tar.gz)
    Source code(zip)
    syntaxdot-0.3.0-cpu-x86_64-linux-gnu-gcc.tar.gz(112.80 MB)
  • 0.2.2(Feb 26, 2021)

    Add keep-best option to the finetune command. With this option only the parameter files for the N best epochs are retained during distillation. The same option for distill is renamed from keep-best-steps to keep-best.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Feb 26, 2021)

  • 0.2.0(Nov 19, 2020)

    • Add the SqueezeBERT model (Iandola et al., 2020). The SqueezeBERT model replaces the matrix multiplications in the self-attention mechanism and feed-forwared layers by grouped convolutions. This results in a smaller number of parameters and better computational performance.

    • Add the SqueezeAlbert model. This model combines SqueezeBERT (Iandola et al., 2020) and ALBERT (Lan et al., 2020)

    • distill: add the attention-loss option. Enabling this option adds the mean squared error (MSE) of the teacher and student attentions to the loss. This can speed up convergence, because the student learns to attend to the same pieces as the teacher.

      Attention loss can only be computed when the teacher and student have the same sequence lengths. This means practically that they should use the same piece tokenizers.

    • Switch to the AdamW optimizer provided by libtorch. The tch binding now has support for the AdamW optimizer and for parameter groups. Consequently, we do not need our own AdamW optimizer implementation anymore. Switching to the Torch optimizer also speeds up training a bit.

    • Move the subword tokenizers into a separate syntaxdot-tokenizers crate.

    • Update to libtorch 1.7.0.

    • Remove the server subcommand. The new REST server is a better replacement, which supports proper error handling, etc.

    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Oct 23, 2020)

Owner
TensorDot
TensorDot
bottom encodes UTF-8 text into a sequence comprised of bottom emoji

bottom encodes UTF-8 text into a sequence comprised of bottom emoji (with , sprinkled in for good measure) followed by ????. It can encode any valid UTF-8 - being a bottom transcends language, after all - and decode back into UTF-8.

Bottom Software Foundation 345 Dec 30, 2022
A seedable Owen-scrambled Sobol sequence.

Sobol-Burley A seedable Owen-scrambled Sobol sequence based on the paper Practical Hash-based Owen Scrambling by Brent Burley, but with an improved ha

Nathan Vegdahl 7 Jul 16, 2022
A "Navie" Implementation of the Wavefront Algorithm For Sequence Alignment with Gap-Affine Scoring

A "Naive" Implementation of the Wavefront Algorithm for Sequence Alignment with Gap-Affine Scoring This repository contains some simple code that I wr

Jason Chin 3 Jul 24, 2023
A Markdown to HTML compiler and Syntax Highlighter, built using Rust's pulldown-cmark and tree-sitter-highlight crates.

A blazingly fast( possibly the fastest) markdown to html parser and syntax highlighter built using Rust's pulldown-cmark and tree-sitter-highlight crate natively for Node's Foreign Function Interface.

Ben Wishovich 48 Nov 11, 2022
A sweet n' simple pastebin with syntax highlighting and no client-side code!

sweetpaste sweetpaste is a sweet n' simple pastebin server. It's completely server-side, with zero client-side code. Configuration The configuration w

Lucy 0 Sep 4, 2022
Source text parsing, lexing, and AST related functionality for Deno

Source text parsing, lexing, and AST related functionality for Deno.

Deno Land 90 Jan 1, 2023
Difftastic is an experimental structured diff tool that compares files based on their syntax.

Difftastic is an experimental structured diff tool that compares files based on their syntax.

Wilfred Hughes 13.9k Jan 2, 2023
better tools for text parsing

nom-text Goal: a library that extends nom to provide better tools for text formats (programming languages, configuration files). current needs Recogni

null 5 Oct 18, 2022
Checks all your documentation for spelling and grammar mistakes with hunspell and a nlprule based checker for grammar

cargo-spellcheck Check your spelling with hunspell and/or nlprule. Use Cases Run cargo spellcheck --fix or cargo spellcheck fix to fix all your docume

Bernhard Schuster 274 Nov 5, 2022
A simple and fast linear algebra library for games and graphics

glam A simple and fast 3D math library for games and graphics. Development status glam is in beta stage. Base functionality has been implemented and t

Cameron Hart 953 Jan 3, 2023
Text calculator with support for units and conversion

cpc calculation + conversion cpc parses and evaluates strings of math, with support for units and conversion. 128-bit decimal floating points are used

Kasper 82 Jan 4, 2023
A command-line tool and library for generating regular expressions from user-provided test cases

Table of Contents What does this tool do? Do I still need to learn to write regexes then? Current features How to install? 4.1 The command-line tool 4

Peter M. Stahl 5.8k Dec 30, 2022
Find and replace text in source files

Ruplacer Find and replace text in source files: $ ruplacer old new src/ Patching src/a_dir/sub/foo.txt -- old is everywhere, old is old ++ new is ever

Tanker 331 Dec 28, 2022
An efficient and powerful Rust library for word wrapping text.

Textwrap Textwrap is a library for wrapping and indenting text. It is most often used by command-line programs to format dynamic output nicely so it l

Martin Geisler 322 Dec 26, 2022
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

regex A Rust library for parsing, compiling, and executing regular expressions. Its syntax is similar to Perl-style regular expressions, but lacks a f

The Rust Programming Language 2.6k Jan 8, 2023
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

triple_accel Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance cal

Daniel Liu 75 Jan 8, 2023
Web 3.0 Realized with Traceless Privacy and Seamless Compatibility

Automata Build On Ubuntu/Debian (or similar distributions on WSL), install the following packages: sudo apt-get update sudo apt-get install -y build-e

Automata Network 81 Nov 29, 2022
Text Expression Runner – Readable and easy to use text expressions

ter - Text Expression Runner ter is a cli to run text expressions and perform basic text operations such as filtering, ignoring and replacing on the c

Maximilian Schulke 72 Jul 31, 2022
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Jan 5, 2023