Core Temporal SDK that can be used as a base for language specific Temporal SDKs

Overview

Build status

Core SDK that can be used as a base for all other Temporal SDKs.

Getting started

See the Architecture doc for some high-level information.

This repo uses a submodule for upstream protobuf files. The path protos/api_upstream is a submodule -- when checking out the repo for the first time make sure you've run git submodule update --init --recursive. TODO: Makefile.

Dependencies

  • Protobuf compiler

Development

All of the following commands are enforced for each pull request.

Building and testing

You can buld and test the project using cargo: cargo build cargo test

Formatting

To format all code run: cargo format --all

Linting

We are using clippy for linting. You can run it using: cargo clippy --all -- -D warnings

Style Guidelines

Error handling

Any error which is returned from a public interface should be well-typed, and we use thiserror for that purpose.

Errors returned from things only used in testing are free to use anyhow for less verbosity.

Comments
  • Unifying Logging / Metrics w/ OpenTelemetry between Core & Lang

    Unifying Logging / Metrics w/ OpenTelemetry between Core & Lang

    We've discussed the desire to unify logging across core/lang.

    Right now, core uses https://github.com/tokio-rs/tracing to simultaneously perform traditional logging functions, while also generating OpenTelemetry traces.

    When we consider unifying these concerns across the language barrier, we really want to do two things:

    1. Have consistent "traditional" logging output - IE: What you'd see in the console / a log file.
    2. Pass tracing context across the language barrier

    For 1, one option (possibly better option described below) is to expose a log function from core which takes a level parameter, a span parameter, a message parameter, and optionally a series of key/value pairs to be logged (all logging is structured). Then, lang side has to translate it's logging interface to the core logging interface in whatever way feels most natural/idiomatic and pass logging through to core. The core implementation of that function then constructs an https://tracing-rs.netlify.app/tracing/struct.event corresponding to the passed in parameters.

    If there's something more automatic we could do here I'm all ears but nothing comes to mind.

    For 2, is seems best for poll_workflow_task and poll_activity_task to return a Span as defined here https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L57 by OpenTelemetry. Then, lang can convert that into the Span type that the tracing library it has chosen to use uses, and proceed from there to pass it around and create new child spans, etc. It would then pass it / such spans back into the logging interface described above as the span parameter.

    While writing the above, I realized there probably is a more elegant option here. The core side can provide an OpenTelemetry "Collector" https://opentelemetry.io/docs/concepts/data-collection/ which the lang side will need to provide an implementation for. Then, ideally, lang simply uses it's opentelem library, registering the custom collector which is core. On the core side, the collector is just a pass through to tokio tracing. Lang would be expected to disable all output, relying on core to collect the logs/traces and output them as console/file/tracing etc. This is more work up front, and possibly per-lang, but also feels substantially less hacky. Unfortunately, I think the spans from 2 still need to be passed into lang so that what it's doing can be associated with certain workflow/activity tasks.

    enhancement 
    opened by Sushisource 13
  • add activity context for rust sdk

    add activity context for rust sdk

    What was changed

    An activity context that tries to be as close to the go sdk as possible. Implementation added in sdk/src/activity_context.rs. sdk/src/lib.rs was also modified to support the new context for each activity worker function.

    1. Closes https://github.com/temporalio/sdk-core/issues/301

    2. How was this tested:

    • Ran Unit Tests
    • Ran Integration tests
    • Sanity Tested with External Project on Docker-compose temporal environment
    1. Any docs updates needed? Not that I know of. Would be happy to contribute with some direction.
    opened by dt665m 10
  • Sticky Queues interface changes

    Sticky Queues interface changes

    This PR seeks to iteratively flesh out the design for any interface changes needed to support sticky task queues.

    Checklist:

    • [x] Add configuration option to enable or disable sticky mode, with a cache size limit
    • [x] Add some way (a new method) to evict specific workflows, or entire cache.
    • [x] Decide on if we should change to using run ids rather than task tokens as the primary identifier returned by polls and provided in activations
      • [x] If so, make that change
    • [x] Add unit tests for multiple worker scenarios
    opened by Sushisource 10
  • What happens when a new pending activation interrupts long poll?

    What happens when a new pending activation interrupts long poll?

    See original discussion here: https://github.com/temporalio/sdk-core/pull/89

    We need to test and determine best course of action when a new PA interrupts a poll. Ideally it should cancel the request, but there are many potential weird corner cases here like what to do if things race and we somehow get a new task back but don't acknowledge it.

    bug 
    opened by Sushisource 9
  • Legacy query provided at same time as other activation jobs

    Legacy query provided at same time as other activation jobs

    Expected Behavior

    Server sends legacy query and activity resolution at the same time and workflow should be able to send response to query and next command.

    Actual Behavior

    This one is a bit hard to reproduce. You have to make an activity resolution happen at the same time as a query. Then the query comes over as a legacy query from the test server and the workflow responds to the query and does a start child workflow in the same response. Core responds to this with:

    Lang SDK sent us a malformed workflow completion for run (a067d736-1f05-4217-8c4c-4c4c5674c605): Workflow completion had a legacy query response along with other commands. This is not allowed and constitutes an error in the lang SDK

    opened by cretz 8
  • [Feature Request] Dynamic HTTP headers

    [Feature Request] Dynamic HTTP headers

    Currently it's only possible to configure static HTTP headers on the Client, we need to add a way to specify dynamic ones. A typical use case for this would be to refresh an auth token and users should not be required to recreate their workers for that.

    enhancement 
    opened by bergundy 6
  • Proto refactor

    Proto refactor

    What was changed

    • Moved protos with separate packages into separate dirs
    • Changed inconsistent wf_* and Wf* to workflow_ and Workflow* respectively

    One thing I didn't do but I want to do is change the coresdk package base to temporal.sdk.core. Some discussions need to be had here.

    Why?

    • It is common practice to not have different proto packages in the same dir
      • Often code generators may choose to use the proto dir/import path instead of the proto package name to choose where to place files
      • It is confusing for an importer to not know based on import what proto package is being used

    Checklist

    1. Closes #247
    opened by cretz 6
  • Added the option to have the activity return a non-retryable error.

    Added the option to have the activity return a non-retryable error.

    What was changed

    Added a NonRetryableActivityError type so that an ActivityFunction can return this Error and the activity is not retried. Hence, the workflow will fail.

    Why?

    To be able to accommodate for the use-case where you as the user know that the activity should not be retried.

    Checklist

    1. Closes: https://github.com/temporalio/sdk-core/issues/454

    2. How was this tested: Test was added activity_non_retryable_failure_with_error that uses the new function: Failure::application_failure_from_error. Also, created a manual test locally: Where an activity was used that returns the new error type. Not sure if a test could be added to local_activities to properly test the error conversion code? Could use some pointers here if this is deemed necessary :) Thanks!

    3. Any docs updates needed? Not sure, as there is no currently published Rust docs.

    opened by tdejager 5
  • [Bug] Core SDK warning messages get printed even if log level is set to error

    [Bug] Core SDK warning messages get printed even if log level is set to error

    What are you really trying to do?

    I'm running workflow tests using the Temporal TypeScript SDK. The test output is littered with warning messages from the Core SDK like this:

    2022-07-27T08:21:22.605721Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    

    This is making the test output really hard to read, e.g.

    > yarn test test/workflows/executeFullScan_test.ts
    yarn run v1.22.18
    $ NODE_ENV=test mocha test/workflows/executeFullScan_test.ts
    choma: to re-use this ordering, run tests with CHOMA_SEED=IonALwEbIv
    
    
      workflows
        executeFullScan
      2022-07-27T08:23:55.738759Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
      2022-07-27T08:23:55.754344Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
          ✔ paginates the source and inspects each page (1342ms)
      2022-07-27T08:23:56.436774Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
      2022-07-27T08:23:56.452200Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
          ✔ syncs insights after all pages are processed (697ms)
          queries
            progress
      2022-07-27T08:23:57.223656Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
      2022-07-27T08:23:57.241008Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
              ✔ returns the workflow's progress (797ms)
    
    
      3 passing (3s)
    
    ✨  Done in 5.55s.
    

    I'm trying to suppress that warning from the Core SDK, but have been unable to do so.

    Describe the bug

    In setting up my test env, I'm installing a custom logger into the Core SDK runtime, and with the log level set to ERROR:

    import { TestWorkflowEnvironment } from '@temporalio/testing'
    import { DefaultLogger, Runtime, NativeConnection } from '@temporalio/worker'
    import { WorkflowClient } from '@temporalio/client'
    
    let testEnv: TestWorkflowEnvironment
    export let testClient: WorkflowClient
    export let testConnection: NativeConnection
    
    global.before('Create Test Environment', async () => {
      const logger = new DefaultLogger('ERROR')
      Runtime.install({ logger })
      testEnv = await TestWorkflowEnvironment.create({
        testServer: {
          stdio: 'ignore',
          // stdio: 'inherit'
        },
        logger
      })
      testConnection = testEnv.nativeConnection
      testClient = testEnv.workflowClient
    })
    
    global.after('Tear Down Test Environment', () => {
      return testEnv?.teardown()
    })
    

    To test my workflows, I create a bunch of workers running different (mock) activites and then run them all using the runUntil function, e.g.

          await insights.runUntil(() => {
            return inspection.runUntil(() => {
              return connector.runUntil(() => {
                return orchestrator.runUntil(() => {
                  return testClient.execute(executeFullScan, {
                    args: [...],
                    workflowId: 'test-workflow',
                    taskQueue: orchestrator.options.taskQueue
                  })
                })
              })
            })
          })
    

    The warnings are created when each of these workers shuts down.

    Minimal Reproduction

    https://github.com/jhecking/samples-typescript/pull/1

    That PR adds a simple workflow test to the hello-world sample, which – when run – demonstrates the warning message generated by the Core SDK:

    > yarn test
    yarn run v1.22.18
    $ ts-node src/test.ts
      2022-07-27T08:44:04.290368Z  WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller died, shutting down
    
    ✨  Done in 2.67s.
    

    Environment/Versions

    • OS and processor: macOS on x86
    • Temporal Version: Temporal TypeScript SDK v1.0.0
    • Are you using Docker or Kubernetes or building Temporal from source? Doesn't apply since this issue occurs when running unit tests without any Temporal server.

    Additional context

    https://temporalio.slack.com/archives/C01DKSMU94L/p1658892545849429

    bug 
    opened by jhecking 5
  • Initial FFI Draft

    Initial FFI Draft

    What was changed

    Initial impl of bridge FFI. This is still in development so lacks a lot of docs.

    Why?

    Go and others need a C API to communicate with core.

    Notes

    1. See #248
    2. This adds a bridge set of protos which is essentially just the Core trait wrapped in proto form. It also provides a gRPC stream service proto.
    3. This adds a bridge-ffi crate which exposes core over C API
    4. This may add a bridge-grpc crate which exposes core over a gRPC stream
    5. We have discussed and decided that protobuf is the encoding language of choice. The serialization/deserialization on both sides of the boundary comes with a known performance cost.
      • This performance cost already exists in core as protobuf is already being used as the encoding between core and lang
      • If we want to improve this, I suggest either https://google.github.io/flatbuffers/ or https://capnproto.org/, both of which operate on byte arrays without explicit serialization and have decent cross-language support
    6. Since core calls are async, we have chosen C function pointer callbacks to report results. There are upsides and downsides to this approach:
      • Anything can be built on callbacks, including event loops or whatever
      • Callbacks do require the lang to respect speed inside the callback. The Rust thread is held up while a callback is executed, so lang must do what it needs to do inside the callback to get the data on its other thread for working (resolve a JS promise, put on a Go channel, etc).
      • As most C APIs do, we allow user data in the callback. This helps callers access data inside the callback and is a very common approach.
      • We could have done an event loop, but you essentially are using function pointers anyways, the only difference is you've now made the response handling single threaded which doesn't benefit languages that can use more than one thread.
    7. This includes a Go example and may include examples of other languages. The examples may not ever be in the final product, we just need to prove things work
    opened by cretz 5
  • Eliminate pending activations by returning new activations from complete

    Eliminate pending activations by returning new activations from complete

    What was changed:

    Return new activations from completing them, when required

    Why?

    • Eliminates complexity around pending activations
    • When you call poll, you always poll the server now
    • When you poll, your activation is always for that task queue
    • Eliminates question about how to "buffer" server responses if lang is dealing with pending activations - so closes https://github.com/temporalio/sdk-core/issues/91

    Checklist

    TODO:

    • [ ] Saw stalling behavior if lang side didn't deal with returned activations from a completion (I think this is because workflow task timeout isn't working right)
    • [ ] Cleanup pass
    1. Closes issue: Closes https://github.com/temporalio/sdk-core/issues/91

    2. How was this tested: Existing & new tests

    opened by Sushisource 5
  • [Feature Request] Cancel activities on shutdown

    [Feature Request] Cancel activities on shutdown

    Currently all lang implementations have to manually cancel activities and have some logic for converting cancelled errors to application failures.

    It'd be nice to move all of that logic into core and avoid some lang side duplication.

    enhancement 
    opened by bergundy 0
  • [Feature Request] Investigate ways to determine if worker polling is healthy

    [Feature Request] Investigate ways to determine if worker polling is healthy

    Is your feature request related to a problem? Please describe.

    There is currently no easy way to know if a worker's poll calls are failing. Users want to make a call on the worker to know whether it's healthy or backing off due to server failure.

    Describe the solution you'd like

    TBD. Options:

    • Create metrics like workflow_task_queue_poll_failed and activity_task_queue_poll_succeed/activity_task_queue_poll_failure and encourage checking those metrics
      • A bit hacky to ask users to do manual subtraction and state management
      • These metrics have value anyways, we should probably add them. long_request_failure is not very detailed (but technically good enough if we exposed a way to create custom metric labels per client).
    • Populate some kind of internal std::sync::atomic::AtomicBool for whether the last poll calls are successful for a worker (or client) and expose some kind of getter to check them
    • Support for general gRPC interceptors from lang through Rust could help advanced uses like this and others
    • Some other on-poll-failed callback mechanism?
    • Customize retry logic for workers so users can opt-in to eagerly failing workers a bit more aggressively
    enhancement 
    opened by cretz 0
  • [Feature Request] Emit larger `workflow_task_schedule_to_start_latency` bucket sizes

    [Feature Request] Emit larger `workflow_task_schedule_to_start_latency` bucket sizes

    Is your feature request related to a problem? Please describe.

    From a user:

    We noticed the metric caps out at 10 seconds. If we start a large batch workflows we are generally fine waiting for them to process for 10-15 minutes. Given the 10 second maximum it’s pretty hard to tell how busy the workflow queue is.

    (small number of workflow task slots)

    enhancement 
    opened by lorensr 0
  • [Feature Request] Add temporal-cli ephemeral server and deprecate temporalite

    [Feature Request] Add temporal-cli ephemeral server and deprecate temporalite

    Describe the solution you'd like

    temporal_sdk_core::ephemeral_server has stuff for Temporalite right now. We are going to replace it with Temporal CLI which is now available at temporal.download. We should deprecate (not yet remove) Temporalite and make equivalent functions for Temporal CLI.

    enhancement 
    opened by cretz 0
  • [Bug] LA Fixes to port from Java

    [Bug] LA Fixes to port from Java

    Just capturing notes, handful of things to fix that we learned about while refactoring Java.

    • https://github.com/temporalio/sdk-core/issues/446
    • Local retry threshold tied to WFT timeout - 3x or whatever, for default.
    • Use retry state so I'm accepting of changes to retry policy or timeout adjustments w/o causing nondeterminism error.
    • Cancel of LA if WFT already failed for some other reason - don't orphan LAs
    bug 
    opened by Sushisource 1
  • [Feature Request] Do not warn about missing activity on activity cancel

    [Feature Request] Do not warn about missing activity on activity cancel

    Is your feature request related to a problem? Please describe.

    If I just start an activity, cancel it, and wait on the task (which has TRY_CANCEL by default), and return from the workflow I get this warning:

    Activity not found on completion. This may happen if the activity has already been cancelled but completed anyway.

    At least I believe that is what is happening.

    Describe the solution you'd like

    Don't show that log if this is an activity cancel since sometimes it is the workflow leaving that is causing the cancel (and other times the workflow may be gone for other reasons). Or maybe reduce the log level.

    enhancement 
    opened by cretz 0
Owner
temporal.io
Temporal Workflow as Code
temporal.io
Rust 核心库和标准库的源码级中文翻译,可作为 IDE 工具的智能提示 (Rust core library and standard library translation. can be used as IntelliSense for IDE tools)

Rust 标准库中文版 这是翻译 Rust 库 的地方, 相关源代码来自于 https://github.com/rust-lang/rust。 如果您不会说英语,那么拥有使用中文的文档至关重要,即使您会说英语,使用母语也仍然能让您感到愉快。Rust 标准库是高质量的,不管是新手还是老手,都可以从中

wtklbm 493 Jan 4, 2023
A domain-specific language for Infrastructure as Code

Skyr A domain-specific language for Infrastructure As Code, with a phased execution model, allowing for expressive and highly dynamic IaC solutions. D

Emil Broman 4 Mar 11, 2023
🦀 Rust crate that allows creating weighted prefix trees that can be used in autocomplete

weighted_trie ?? Rust crate that allows creating weighted prefix trees that can be used in autocomplete Released API Docs Quickstart To use weigthed-t

Alexander Osipenko 8 Mar 1, 2023
Base Garry's Mod binary module (Rust)

gmod-module-base-rs A base for developing Garry's Mod binary modules in Rust. Getting Started Install Rust Download or git clone this repository Open

William 7 Jul 30, 2022
Core libraries, services and CLIs for Monetæ

Core libraries, services and CLIs for Monetæ

monetæ 1 Nov 1, 2021
Purplecoin Core integration/staging tree

ℙurplecoin Official implementation of Purplecoin, the first stateless cryptocurrency. Requires Rust Nightly >=v1.63.0. WARNING The source code is stil

Purplecoin 5 Dec 31, 2022
Purplecoin Core integration/staging tree

ℙurplecoin Official implementation of Purplecoin, the first stateless cryptocurrency. Requires Rust Nightly >=v1.63.0. WARNING The source code is stil

Purplecoin 8 Jan 12, 2023
Rust library provides a standalone implementation of the ROS (Robot Operating System) core

ROS-core implementation in Rust This Rust library provides a standalone implementation of the ROS (Robot Operating System) core. It allows you to run

Patrick Wieschollek 3 Apr 26, 2023
Repo for apps for the Pocket RISC-V core for Analogue Pocket/OpenFPGA. Multiple branches.

This is a repo meant to host Rust programs for agg23's Pocket RISC-V platform. While Rust can be built out of the openfpga-litex repo directly, this r

null 3 Dec 12, 2023
SDK for the Portfolio protocol written in rust.

portfolio-rs Minimalist toolkit for building rust applications on top of the portfolio protocol. Installation [Required] Foundry. Source. If not insta

Primitive 5 Aug 14, 2023
:crab: Small exercises to get you used to reading and writing Rust code!

rustlings ?? ❤️ Greetings and welcome to rustlings. This project contains small exercises to get you used to reading and writing Rust code. This inclu

The Rust Programming Language 33.1k Jan 2, 2023
This project contains small exercises to get you used to reading and writing Rust code

rustlings ?? ❤️ Greetings and welcome to rustlings. This project contains small exercises to get you used to reading and writing Rust code. This inclu

Cynthia Tran 1 May 24, 2022
A set of Zero Knowledge modules, written in Rust and designed to be used in other system programming environments.

Zerokit A set of Zero Knowledge modules, written in Rust and designed to be used in other system programming environments. Initial scope Focus on RLN

vac 44 Dec 27, 2022
A relatively simple puzzle generator application written in Rust and used via Javascript

Puzzlip Basic Overview This is a relatively simple puzzle generator application written in Rust and used via Javascript in https://puzzlip.com. If you

Nenad 5 Dec 7, 2022
This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust to everyone.

Comprehensive Rust ?? This repository has the source code for Comprehensive Rust ?? , a four day Rust course developed by the Android team. The course

Google 5.2k Jan 3, 2023
An ownership model that is used to replace the Ring in Linux.

std-ownership An ownership model that is used to replace the Ring in Linux. It's 10x faster than Ring in Syscall. Overview The ownership system allows

Rhodes 4 Feb 13, 2023
Elton is a benchmark utility written in rust aimed to be used to benchmark HTTP calls.

Elton Elton is an HTTP Benchmark utility with options to be used within an HTTP interface. Installation Elton is currently available via Docker or by

Emil Priver 5 Sep 22, 2023
A repository for showcasing my knowledge of the Rust programming language, and continuing to learn the language.

Learning Rust I started learning the Rust programming language before using GitHub, but increased its usage afterwards. I have found it to be a fast a

Sean P. Myrick V19.1.7.2 2 Nov 8, 2022
Nyah is a programming language runtime built for high performance and comes with a scripting language.

?? Nyah ( Unfinished ) Nyah is a programming language runtime built for high performance and comes with a scripting language. ??️ Status Nyah is not c

Stacker 3 Mar 6, 2022