Hi,
during finetuning I did:
syntaxdot finetune syntaxdot.conf bert-base-german-syntaxdot.pt tuebadz-conllu-new.conllu tuebadz-dev.conllu --gpu 0 --label-smoothing 0.03 --maxlen 100 --warmup 10000
which throws the error:
Error: Cannot construct model
Caused by:
Internal torch error: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from distribution_nullary_kernel at /pytorch/aten/src/ATen/native/cuda/DistributionTemplates.h:158 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7fda869591d9 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libc10.so)
frame #1: + 0x100c380 (0x7fdb2fd12380 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so)
frame #2: void at::native::(anonymous namespace)::distribution_nullary_kernel<float, float, 4, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::TensorIteratorBase&, at::CUDAGeneratorImpl, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>), &(void at::native::templates::cuda::normal_and_transform<float, float, 4ul, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float> >(at::TensorIteratorBase&, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>)), 2u>>, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float> >(at::TensorIteratorBase&, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::TensorIteratorBase&, at::CUDAGeneratorImpl, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>), &(void at::native::templates::cuda::normal_and_transform<float, float, 4ul, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float> >(at::TensorIteratorBase&, at::CUDAGeneratorImpl*, __nv_dl_wrapper_t<__nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>)), 2u>> const&, __nv_dl_wrapper_t<nv_dl_tag<void ()(at::Tensor&, double, double, at::CUDAGeneratorImpl), &(void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*)), 2u>, float, float>) + 0x9ae (0x7fdb306d296e in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so)
frame #3: void at::native::templates::cuda::normal_kernelat::CUDAGeneratorImpl*(at::Tensor&, double, double, at::CUDAGeneratorImpl*) + 0x311 (0x7fdb306d42b1 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so)
frame #4: at::native::normal_kernel(at::Tensor&, double, double, c10::optionalat::Generator) + 0xbf (0x7fdb306d013f in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so)
frame #5: + 0x111667e (0x7fda87cbc67e in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #6: at::native::normal(at::Tensor&, double, double, c10::optionalat::Generator) + 0x39 (0x7fda87caf7b9 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #7: + 0x2d66c53 (0x7fdb31a6cc53 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so)
frame #8: + 0x2d66d45 (0x7fdb31a6cd45 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cuda_cu.so)
frame #9: at::Tensor::normal(double, double, c10::optionalat::Generator) const + 0x180 (0x7fda88a991b0 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #10: at::native::randn(c10::ArrayRef, c10::optionalat::Generator, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x56b (0x7fda87ee63db in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #11: at::native::randn(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional) + 0x51 (0x7fda87ee64f1 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #12: + 0x1cc0706 (0x7fda88866706 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #13: + 0x1ad94cd (0x7fda8867f4cd in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #14: + 0x1ae2b5b (0x7fda88688b5b in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #15: at::randn(c10::ArrayRef, c10::TensorOptions) + 0x298 (0x7fda882bcb48 in /home/anna/Dokumente/Bert-German-Base/libtorch/lib/libtorch_cpu.so)
frame #16: + 0x2c9da0 (0x563b611a6da0 in syntaxdot)
frame #17: + 0x2b6827 (0x563b61193827 in syntaxdot)
frame #18: + 0x2b6365 (0x563b61193365 in syntaxdot)
frame #19: + 0x2ad5ba (0x563b6118a5ba in syntaxdot)
frame #20: + 0x132aac (0x563b6100faac in syntaxdot)
frame #21: + 0x17d25e (0x563b6105a25e in syntaxdot)
frame #22: + 0x18576d (0x563b6106276d in syntaxdot)
frame #23: + 0x15b775 (0x563b61038775 in syntaxdot)
frame #24: + 0xf3b21 (0x563b60fd0b21 in syntaxdot)
frame #25: + 0x18995f (0x563b6106695f in syntaxdot)
frame #26: + 0x18c523 (0x563b61069523 in syntaxdot)
frame #27: + 0xe8afd (0x563b60fc5afd in syntaxdot)
frame #28: + 0x3bceca (0x563b61299eca in syntaxdot)
frame #29: + 0x189e42 (0x563b61066e42 in syntaxdot)
frame #30: __libc_start_main + 0xf3 (0x7fda8656b0b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #31: + 0x9be1e (0x563b60f78e1e in syntaxdot)
I have CUDA on my Ubuntu 20.04 (literally reinstalled everything at least 20 times). Nvcc -V shows:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
I need a 11.1 build for libtorch, so I really need the compiler to be 11.1. Nvidia-smi shows Cuda version 11.4 (and I cannot change it).
I added cuda to PATH as described in the docs, I also added Libtorch to PATH and LD_LIBRARY_PATH as suggested in the doc/install.
Don't know what to do. Anyone here to help?