Ultralight, security-first service mesh for Kubernetes. Main repo for Linkerd 2.x.

Overview

Linkerd

Linkerd

CII Best Practices GitHub Actions Status GitHub license Go Report Card Slack Status

🎈 Welcome to Linkerd! πŸ‘‹

Linkerd is an ultralight, security-first service mesh for Kubernetes. Linkerd adds critical security, observability, and reliability features to your Kubernetes stack with no code change required.

Linkerd is a Cloud Native Computing Foundation (CNCF) project.

Repo layout

This is the primary repo for the Linkerd 2.x line of development.

The complete list of Linkerd repos is:

Quickstart and documentation

You can run Linkerd on any modern Kubernetes cluster in a matter of seconds. See the Linkerd Getting Started Guide for how.

For more comprehensive documentation, start with the Linkerd docs. (The doc source code is available in the website repo.)

Working in this repo

BUILD.md includes general information on how to work in this repo.

We ❀️ pull requests! See CONTRIBUTING.md for info on contributing changes.

Get involved

Community meetings

We host regular online meetings for contributors, adopters, maintainers, and anyone else interested to connect in a synchronous fashion. These meetings usually take place the last Thursday of the month at 9am Pacific / 4pm UTC.

We're a friendly group, so please feel free to join us!

Steering Committee meetings

We host regular online meetings for the Linkerd Steering Committee. All are welcome to attend, but audio and video participation is limited to Steering Committee members and maintainers. These meetings are currently scheduled on an ad-hoc basis and announced on the linkerd-users mailing list.

Code of Conduct

This project is for everyone. We ask that our users and contributors take a few minutes to review our Code of Conduct.

Security

See SECURITY.md for our security policy, including how to report vulnerabilities.

A third party security audit was performed by Cure53 in June 2019. You can see the full report here.

License

Copyright 2021 the Linkerd Authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Comments
  • When downstream deployment is updated, upstream pods don't connect to new downstream pods

    When downstream deployment is updated, upstream pods don't connect to new downstream pods

    Given: deploy_a -> deploy_b (a is a client/connects to b)

    Sometimes, not always, when a rolling update of deploy_b is performed (new pods are created), each request made by deploy_a start failing, as if the proxies in the deploy_a pods were not aware that the destination pods for deploy_b had changed. Killing all deploy_a pods (i.e. letting them respawn) fixes the problem.

    Unfortunately, I don't have much more than this to say about this, as I did not yet have time to dive deeper into the issue and try to better understand what is going on.

    Also, I apologize if this has already been filed, I did not search existing issues.

    I just wanted to put it out there because I've been bitten by this since early Conduit versions and have been waiting to have details before filing it...

    area/controller priority/P0 bug 
    opened by bourquep 74
  • Support for ARM based architectures?

    Support for ARM based architectures?

    Hi,

    I'm experimenting with a Kubernetes cluster that runs on aarch64 architecture. I'm unable to install Conduit:

    curl https://run.conduit.io/install | sh
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  1353  100  1353    0     0   2142      0 --:--:-- --:--:-- --:--:--  2140
    Downloading conduit-0.4.4-linux...
    Conduit was successfully installed πŸŽ‰
    
    Copy /root/.conduit/bin/conduit into your PATH.  Then run
    
        conduit install | kubectl apply -f -
    
    to deploy Conduit to Kubernetes.  Once deployed, run
    
        conduit dashboard
    
    to view the Conduit UI.
    Visit conduit.io for more information.
    [root@testserver home]# export PATH=$PATH:$HOME/.conduit/bin
    [root@testserver home]# conduit
    -bash: /root/.conduit/bin/conduit: cannot execute binary file
    [root@testserver home]# 
    
    

    The binary that was installed seems to be for x64 architecture:

     readelf -h conduit 
    ELF Header:
      Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
      Class:                             ELF64
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              EXEC (Executable file)
      Machine:                           Advanced Micro Devices X86-64
      Version:                           0x1
      Entry point address:               0x456880
      Start of program headers:          64 (bytes into file)
      Start of section headers:          456 (bytes into file)
      Flags:                             0x0
      Size of this header:               64 (bytes)
      Size of program headers:           56 (bytes)
      Number of program headers:         7
      Size of section headers:           64 (bytes)
      Number of section headers:         13
      Section header string table index: 3
    

    Are there plans to support other architectures besides x64?

    help wanted area/proxy area/cli area/build 
    opened by jeroenjacobs79 65
  • Linkerd 2.5.0: linkerd2_proxy::app::errors unexpected error: error trying to connect: No route to host (os error 113) (address: 10.10.3.181:8080)

    Linkerd 2.5.0: linkerd2_proxy::app::errors unexpected error: error trying to connect: No route to host (os error 113) (address: 10.10.3.181:8080)

    Bug Report

    What is the issue?

    We have injected pod which is running for days connecting to a partially (1/3) injected deployment, which is eventually throwing the mentioned error.

    How can it be reproduced?

    Run a pod for days and let it talk to a deployment which is regularly restarted, which is causing new pods and new IP addresses etc.

    Logs, error output, etc

    linkerd-proxy ERR! [589538.085822s] linkerd2_proxy::app::errors unexpected error: error trying to connect: No route to host (os error 113) (address: 10.10.3.181:8080)

    linkerd check output

    kubernetes-api
    --------------
    √ can initialize the client
    √ can query the Kubernetes API
    
    kubernetes-version
    ------------------
    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version
    
    linkerd-config
    --------------
    √ control plane Namespace exists
    √ control plane ClusterRoles exist
    √ control plane ClusterRoleBindings exist
    √ control plane ServiceAccounts exist
    √ control plane CustomResourceDefinitions exist
    √ control plane MutatingWebhookConfigurations exist
    √ control plane ValidatingWebhookConfigurations exist
    √ control plane PodSecurityPolicies exist
    
    linkerd-existence
    -----------------
    √ 'linkerd-config' config map exists
    √ control plane replica sets are ready
    √ no unschedulable pods
    √ controller pod is running
    √ can initialize the client
    √ can query the control plane API
    
    linkerd-api
    -----------
    √ control plane pods are ready
    √ control plane self-check
    √ [kubernetes] control plane can talk to Kubernetes
    √ [prometheus] control plane can talk to Prometheus
    √ no invalid service profiles
    
    linkerd-version
    ---------------
    √ can determine the latest version
    √ cli is up-to-date
    
    control-plane-version
    ---------------------
    √ control plane is up-to-date
    √ control plane and cli versions match
    
    Status check results are √
    

    Environment

    • Kubernetes Version: 1.15.2
    • Cluster Environment: custom
    • Host OS: CoreOS 2191.5.0
    • Linkerd version: 2.5.0

    Possible solution

    Additional context

    To me, not knowing all details, it looks like that the proxy is not "refreshing" the endpoints for the service and eventually just runs out of ip addresses. For us it would be fine if the proxy would exit and let the pod get restarted.

    Also: linkerd is pretty awesome, thanks for all your effort you put into it!

    bug needs/repro bug/staleness 
    opened by bjoernhaeuser 57
  • Error from gRPC .net client throw error

    Error from gRPC .net client throw error "proxy max-concurrency exhausted"

    Bug Report

    From few days I have started receiving such error: Grpc.Core.RpcException: Status(StatusCode=Unavailable, Detail="proxy max-concurrency exhausted")

    What is the issue?

    Problem to connect from .net client to the python gRPC server on kubernetes using linkerd as service mesh

    How can it be reproduced?

    I can't reproduce this by myself only happen on the production environment in last few days

    Logs, error output, etc

    (If the output is long, please create a gist and paste the link here.)

    linkerd check output

    kubernetes-api
    --------------
    √ can initialize the client
    √ can query the Kubernetes API
    
    kubernetes-version
    ------------------
    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version
    
    linkerd-existence
    -----------------
    √ 'linkerd-config' config map exists
    √ heartbeat ServiceAccount exist
    √ control plane replica sets are ready
    √ no unschedulable pods
    √ controller pod is running
    √ can initialize the client
    √ can query the control plane API
    
    linkerd-config
    --------------
    √ control plane Namespace exists
    √ control plane ClusterRoles exist
    √ control plane ClusterRoleBindings exist
    √ control plane ServiceAccounts exist
    √ control plane CustomResourceDefinitions exist
    √ control plane MutatingWebhookConfigurations exist
    √ control plane ValidatingWebhookConfigurations exist
    √ control plane PodSecurityPolicies exist
    
    linkerd-identity
    ----------------
    √ certificate config is valid
    √ trust anchors are using supported crypto algorithm
    √ trust anchors are within their validity period
    √ trust anchors are valid for at least 60 days
    √ issuer cert is using supported crypto algorithm
    √ issuer cert is within its validity period
    √ issuer cert is valid for at least 60 days
    √ issuer cert is issued by the trust anchor
    
    linkerd-api
    -----------
    √ control plane pods are ready
    √ control plane self-check
    √ [kubernetes] control plane can talk to Kubernetes
    √ [prometheus] control plane can talk to Prometheus
    √ tap api service is running
    
    linkerd-version
    ---------------
    √ can determine the latest version
    √ cli is up-to-date
    
    control-plane-version
    ---------------------
    √ control plane is up-to-date
    √ control plane and cli versions match
    
    linkerd-addons
    --------------
    √ 'linkerd-config-addons' config map exists
    
    linkerd-grafana
    ---------------
    √ grafana add-on service account exists
    √ grafana add-on config map exists
    √ grafana pod is running
    
    Status check results are √
    

    Environment

    • Kubernetes Version: 1.15.12
    • Cluster Environment: AKS
    • Host OS: Linux
    • Linkerd version: 2.8.1 (stable)

    Possible solution

    Additional context

    needs/more 
    opened by rafalkasa 45
  • Pod can't reliably establish watches properly

    Pod can't reliably establish watches properly

    Bug Report

    What is the issue?

    I am running the latest version of linkerd edge 19.1.2 and I am getting this error

    WARN admin={bg=resolver} linkerd2_proxy::control::destination::background::destination_set Destination.Get stream errored for NameAddr { name: DnsName(DNSName("cs-ch-domain-manager-v1.content-hub-test.svc.cluster.local.")), port: 8080 }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })
    

    How can it be reproduced?

    I just deployed the latest version. Nothing more

    Logs, error output, etc

    output for linkerd logs --control-plane-component controller

    linkerd linkerd-controller-7bc49fd77f-lwt8q linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
    

    output for linkerd logs --control-plane-component controller -c proxy-api

    linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T13:54:55Z" level=info msg="Stopping watch on endpoint cs-ch-domain-manager-v1.content-hub-test:8080"
    linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api W0121 15:57:34.899318       1 reflector.go:341] k8s.io/client-go/informers/factory.go:130: watch of *v1beta2.ReplicaSet ended with: too old resource version: 3417120 (3420499)
    linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T17:25:43Z" level=info msg="Establishing watch on endpoint cs-ch-domain-manager-v1.content-hub-test:8080"
    linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T17:32:18Z" level=info msg="Stopping watch on endpoint cs-ch-domain-manager-v1.content-hub-test:8080"
    linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api W0121 17:49:54.531144       1 reflector.go:341] k8s.io/client-go/informers/factory.go:130: watch of *v1beta2.ReplicaSet ended with: too old resource version: 3437967 (3439015)
    linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T21:32:21Z" level=info msg="Establishing watch on endpoint linkerd-prometheus.linkerd:9090"
    

    (If the output is long, please create a gist and paste the link here.)

    linkerd check output

    kubernetes-api
    --------------
    βœ” can initialize the client
    βœ” can query the Kubernetes API
    
    kubernetes-version
    ------------------
    βœ” is running the minimum Kubernetes API version
    
    linkerd-existence
    -----------------
    βœ” control plane namespace exists
    βœ” controller pod is running
    βœ” can initialize the client
    βœ” can query the control plane API
    
    linkerd-api
    -----------
    βœ” control plane pods are ready
    βœ” can query the control plane API
    βœ” [kubernetes] control plane can talk to Kubernetes
    βœ” [prometheus] control plane can talk to Prometheus
    
    linkerd-service-profile
    -----------------------
    βœ” no invalid service profiles
    
    linkerd-version
    ---------------
    βœ” can determine the latest version
    βœ” cli is up-to-date
    
    control-plane-version
    ---------------------
    βœ” control plane is up-to-date
    
    Status check results are βœ”
    

    Environment

    • Kubernetes Version:
    • Cluster Environment: (GKE, AKS, kops, ...) EKS
    • Host OS: Amazon ami
    • Linkerd version: edge 19.1.2

    Possible solution

    Additional context

    area/controller priority/P0 bug 
    opened by jmirc 43
  • Intermittent 502 status code

    Intermittent 502 status code

    Bug Report

    What is the issue?

    We have an restful application, where the client receives intermittent 502 status codes, but the application itself logs a 201. If we disable linkerd2 we are unable to reproduce this issue.

    The basic traffic flow is as follows (all supposedly HTTP1.1): Client -> Ambassador(envoy) -(via linkerd2)-> App (see additional context for diagram)

    This only happens for a very particular route, which also calls a cluster external http-service and for no other routes so far!

    How can it be reproduced?

    We tried to reproduce this with an artificial setup, setting up the ingress with ambassador and using httpbin as application both using linkerd2 as service-mesh. However this was unsuccessful and we were unable to reproduce this outside our production deployments or with other routes.

    Logs, error output, etc

    In the linkerd sidecar attached to ambassador the following error pops up, whenever the route fails:

    [figo-ambassador-586c797dc-p9pt8 linkerd-proxy] WARN [  1861.009733s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} linkerd2_proxy::proxy::http::orig_proto unknown l5d-orig-proto header value: "-"
    [figo-ambassador-586c797dc-p9pt8 linkerd-proxy] WARN [  1861.009760s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} hyper::proto::h1::role response with HTTP2 version coerced to HTTP/1.1
    [figo-ambassador-586c797dc-p9pt8 linkerd-proxy] ERR! [  1864.515657s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.73.113:44428} linkerd2_proxy::app::errors unexpected error: http2 general error: protocol error: unspecific protocol error detected
    [figo-ambassador-586c797dc-7s9x6 linkerd-proxy] ERR! [  1833.975088s] proxy={server=out listen=127.0.0.1:4140 remote=10.7.69.131:57912} linkerd2_proxy::app::errors unexpected error: http2 general error: protocol error: unspecific protocol error detected
    

    (The warnings were caused by a previous successful call)

    We increase the log level via config.linkerd.io/proxy-log-level: trace https://gist.github.com/trevex/ca0791aad3402137ed551b251970d329

    linkerd check output

    kubernetes-api
    --------------
    √ can initialize the client
    √ can query the Kubernetes API
    
    kubernetes-version
    ------------------
    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version
    
    linkerd-existence
    -----------------
    √ control plane namespace exists
    √ controller pod is running
    √ can initialize the client
    √ can query the control plane API
    
    linkerd-api
    -----------
    √ control plane pods are ready
    √ control plane self-check
    √ [kubernetes] control plane can talk to Kubernetes
    √ [prometheus] control plane can talk to Prometheus
    √ no invalid service profiles
    
    linkerd-version
    ---------------
    √ can determine the latest version
    √ cli is up-to-date
    
    control-plane-version
    ---------------------
    √ control plane is up-to-date
    √ control plane and cli versions match
    
    Status check results are √
    

    Environment

    • Kubernetes Version: 1.14.1
    • Cluster Environment: bare-metal
    • Host OS: ContainerLinux
    • Linkerd version: stable-2.3.0
    • CNI: Cilium 1.4.4
    • DNS: CoreDNS 1.5.0

    Possible solution

    Additional context

    Diagram from Slack: https://files.slack.com/files-pri/T0JV2DX9R-FJA61H9CH/ambassador-linkerd2.png

    Please let me know if I can provide more information :)

    area/proxy priority/P0 needs/repro 
    opened by trevex 42
  • All linkerd components in Init:CrashLoopBackOff

    All linkerd components in Init:CrashLoopBackOff

    Having below issue with all linkerd components, after circa 48hrs from pod creation.

    Environment: kubernetes: GitVersion:"v1.13.2" docker package: docker-1.13.1-103.git7f2769b.el7.centos.x86_64

    linkerd version Client version: edge-19.10.1 Server version: edge-19.10.1

    Kubernetes running on virtual machines in Google Cloud Platform, with host OS: CentOS Linux release 7.6.1810 (Core)

    kubectl get pod -n linkerd
    NAME                                      READY   STATUS                  RESTARTS   AGE
    linkerd-controller-69d84c4f8c-nd96z       0/2     Init:CrashLoopBackOff   544        2d
    linkerd-destination-77bcd7497c-57gqf      0/2     Init:CrashLoopBackOff   544        2d
    linkerd-grafana-69b7c55969-mf4h5          0/2     Init:CrashLoopBackOff   544        2d
    linkerd-identity-6b6854c8f7-mcw74         0/2     Init:Error              545        2d
    linkerd-prometheus-9d59769cc-rjmf8        0/2     Init:CrashLoopBackOff   545        2d
    linkerd-proxy-injector-686fd49d85-p2cfc   0/2     Init:CrashLoopBackOff   544        2d
    linkerd-sp-validator-77867c74fd-8zgw7     0/2     Init:CrashLoopBackOff   545        2d
    linkerd-tap-6c647878c5-bpc2l              0/2     Init:CrashLoopBackOff   545        2d
    linkerd-web-7dc9c4b794-vlhqg              0/2     Init:CrashLoopBackOff   544        2d
    
    

    Snippet from pod description:

    kubectl describe pod linkerd-destination-77bcd7497c-57gqf -n linkerd
    
    ---
    Init Containers:
      linkerd-init:
        Container ID:  docker://e0cd95a592055a5f8e3a758a324a7706a90f74e44d5f753ff697e7a3a379086b
        Image:         gcr.io/linkerd-io/proxy-init:v1.2.0
        Image ID:      docker-pullable://gcr.io/linkerd-io/proxy-init@sha256:c0174438807cdd711867eb1475fba3dd959d764358de4e5f732177e07a75925b
        Port:          <none>
        Host Port:     <none>
        Args:
          --incoming-proxy-port
          4143
          --outgoing-proxy-port
          4140
          --proxy-uid
          2102
          --inbound-ports-to-ignore
          4190,4191
          --outbound-ports-to-ignore
          443
        State:       Waiting
          Reason:    CrashLoopBackOff
        Last State:  Terminated
          Reason:    Error
          Message:   2019/10/11 12:52:18 < iptables: Too many links.
    
    2019/10/11 12:52:18 Will ignore port 4190 on chain PROXY_INIT_REDIRECT
    2019/10/11 12:52:18 Will ignore port 4191 on chain PROXY_INIT_REDIRECT
    2019/10/11 12:52:18 Will redirect all INPUT ports to proxy
    2019/10/11 12:52:18 > iptables -t nat -F PROXY_INIT_OUTPUT
    2019/10/11 12:52:18 <
    2019/10/11 12:52:18 > iptables -t nat -X PROXY_INIT_OUTPUT
    2019/10/11 12:52:18 < iptables: Too many links.
    
    2019/10/11 12:52:18 Ignoring uid 2102
    2019/10/11 12:52:18 Will ignore port 443 on chain PROXY_INIT_OUTPUT
    2019/10/11 12:52:18 Redirecting all OUTPUT to 4140
    2019/10/11 12:52:18 Executing commands:
    2019/10/11 12:52:18 > iptables -t nat -N PROXY_INIT_REDIRECT -m comment --comment proxy-init/redirect-common-chain/1570798338
    2019/10/11 12:52:18 < iptables: Chain already exists.
    
    2019/10/11 12:52:18 Aborting firewall configuration
    Error: exit status 1
    ---
    
    opened by jurgengrech 39
  • lInkerd-proxy connection refused/closed/reset errors.

    lInkerd-proxy connection refused/closed/reset errors.

    Bug Report

    What is the issue?

    I am working on finding the root cause for an increased connection drops/failures on our linkerd injected apps which happens rarely and noticed a lot of connection related errors (Connection closed error=Service in fail-fast” Connection reset by peer (os error 104) Connection refused (os error 111) Transport endpoint is not connected (os error 107) etc..) in the proxy log on all our services and I am wondering if linkerd has got anything to do with the increased failures in our apps. I'm kind of stuck in finding the root cause, so I would really appreciate if someone can explain more about these connection errors.

    How can it be reproduced?

    I am not able to reproduce the issue, the issue (increased connection failures) rarely happens and there are a lot of tcp connection closed/refused errors in the proxy logs as seen below.

    Logs, error output, etc

    "Dec 2, 2020 @ 06:45:47.043","[ 80905.816738s]  INFO ThreadId(01) outbound:accept{peer.addr=10.244.90.101:39120 target.addr=10.244.57.132:9092}: linkerd2_app_core::serve: Connection closed error=Service in fail-fast"
    "Dec 2, 2020 @ 06:45:47.042","[ 80905.241029s]  INFO ThreadId(01) outbound:accept{peer.addr=10.244.90.101:34980 target.addr=10.244.54.131:9092}: linkerd2_app_core::serve: Connection closed error=Service in fail-fast"
    "Dec 2, 2020 @ 06:45:51.858","[    10.069981s]  WARN ThreadId(01) inbound:accept{peer.addr=10.244.33.105:37338 target.addr=10.244.33.169:80}: linkerd2_app_core::errors: Failed to proxy request: error trying to connect: Connection refused (os error 111)"
    "Dec 2, 2020 @ 06:45:51.858","[    11.203328s]  WARN ThreadId(01) inbound:accept{peer.addr=10.244.33.1:46326 target.addr=10.244.33.169:80}: linkerd2_app_core::errors: Failed to proxy request: error trying to connect: Connection refused (os error 111)"
    "Dec 2, 2020 @ 06:45:51.858","[    22.103038s]  INFO ThreadId(01) outbound:accept{peer.addr=10.244.33.169:49660 target.addr=10.244.59.53:9092}: linkerd2_app_core::serve: Connection closed error=Service in fail-fast"
    
    [  4784.452349s]  INFO ThreadId(01) inbound:accept{peer.addr=10.244.44.107:44614 target.addr=10.244.44.106:80}: linkerd2_app_core::serve: Connection closed error=connection error: Transport endpoint is not connected (os error 107)
    [  5089.706325s]  INFO ThreadId(01) inbound:accept{peer.addr=10.244.44.107:44636 target.addr=10.244.44.106:80}: linkerd2_app_core::serve: Connection closed error=connection error: Transport endpoint is not connected (os error 107)
    [  5124.234815s]  INFO ThreadId(01) inbound:accept{peer.addr=10.244.44.107:46202 target.addr=10.244.44.106:80}: linkerd2_app_core::serve: Connection closed error=connection error: Transport endpoint is not connected (os error 107)
    
    [  5318.467170s]  INFO ThreadId(01) outbound:accept{peer.addr=10.244.44.106:60420 target.addr=10.97.182.190:5432}: linkerd2_app_core::serve: Connection closed error=Connection reset by peer (os error 104)
    [  5329.525274s]  INFO ThreadId(01) outbound:accept{peer.addr=10.244.44.106:59506 target.addr=10.97.182.190:5432}: linkerd2_app_core::serve: Connection closed error=Connection reset by peer (os error 104)
    [  5331.837249s]  INFO ThreadId(01) outbound:accept{peer.addr=10.244.44.106:58566 target.addr=10.97.182.190:5432}: linkerd2_app_core::serve: Connection closed error=Connection reset by peer (os error 104)
    

    linkerd check output

    kubernetes-api
    --------------
    √ can initialize the client
    √ can query the Kubernetes API
    
    kubernetes-version
    ------------------
    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version
    
    linkerd-existence
    -----------------
    √ 'linkerd-config' config map exists
    √ heartbeat ServiceAccount exist
    √ control plane replica sets are ready
    √ no unschedulable pods
    √ controller pod is running
    √ can initialize the client
    √ can query the control plane API
    
    linkerd-config
    --------------
    √ control plane Namespace exists
    √ control plane ClusterRoles exist
    √ control plane ClusterRoleBindings exist
    √ control plane ServiceAccounts exist
    √ control plane CustomResourceDefinitions exist
    √ control plane MutatingWebhookConfigurations exist
    √ control plane ValidatingWebhookConfigurations exist
    √ control plane PodSecurityPolicies exist
    
    linkerd-identity
    ----------------
    √ certificate config is valid
    √ trust anchors are using supported crypto algorithm
    √ trust anchors are within their validity period
    √ trust anchors are valid for at least 60 days
    √ issuer cert is using supported crypto algorithm
    √ issuer cert is within its validity period
    √ issuer cert is valid for at least 60 days
    √ issuer cert is issued by the trust anchor
    
    linkerd-webhooks-and-apisvc-tls
    -------------------------------
    √ tap API server has valid cert
    √ tap API server cert is valid for at least 60 days
    √ proxy-injector webhook has valid cert
    √ proxy-injector cert is valid for at least 60 days
    √ sp-validator webhook has valid cert
    √ sp-validator cert is valid for at least 60 days
    
    linkerd-api
    -----------
    √ control plane pods are ready
    √ control plane self-check
    √ [kubernetes] control plane can talk to Kubernetes
    √ [prometheus] control plane can talk to Prometheus
    √ tap api service is running
    
    linkerd-version
    ---------------
    √ can determine the latest version
    √ cli is up-to-date
    
    control-plane-version
    ---------------------
    √ control plane is up-to-date
    √ control plane and cli versions match
    
    Status check results are √
    
    

    Environment

    • Kubernetes Version: v1.18.8
    • Cluster Environment: Self hosted, Deployed via kubeadm
    • Host OS: Ubuntu 18.04.2 LTS
    • Linkerd version: stable-2.9.0 (deployed using helm chart)

    Possible solution

    Additional context

    priority/P0 
    opened by prasus 37
  • Controller does not start cleanly on GKE

    Controller does not start cleanly on GKE

    This appears to have changed since the v0.3.0 release.

    When I install Conduit on GKE, the pods in the conduit namespace restart multiple times before stabilizing and entering the Running state. I'd expect them to not restart at all. For example:

    $ kubectl -n conduit get po
    NAME                          READY     STATUS    RESTARTS   AGE
    controller-5b5c6c4846-6nxb2   6/6       Running   3          2m
    prometheus-598fc79646-zl2dw   3/3       Running   0          2m
    web-85799d759c-vz2bv          2/2       Running   0          2m
    

    It's hard to track down which of the containers is causing the pod to restart, but I see this in the proxy-api container's logs:

    $ kubectl -n conduit logs controller-5b5c6c4846-6nxb2 proxy-api
    time="2018-02-28T00:36:17Z" level=info msg="running conduit version git-9ffe8b79"
    time="2018-02-28T00:36:17Z" level=info msg="serving scrapable metrics on :9996"
    time="2018-02-28T00:36:17Z" level=info msg="starting gRPC server on :8086"
    time="2018-02-28T00:36:27Z" level=error msg="Report: rpc error: code = Unavailable desc = all SubConns are in TransientFailure"
    time="2018-02-28T00:36:28Z" level=error msg="Report: rpc error: code = Unavailable desc = all SubConns are in TransientFailure"
    time="2018-02-28T00:36:28Z" level=error msg="Report: rpc error: code = Unavailable desc = all SubConns are in TransientFailure"
    ...
    time="2018-02-28T00:36:57Z" level=error msg="Report: rpc error: code = Unknown desc = ResponseCtx is required"
    

    I also see this in the conduit-proxy container's logs:

    $ kubectl -n conduit logs controller-5b5c6c4846-6nxb2 conduit-proxy
    INFO conduit_proxy using controller at HostAndPort { host: DnsName("localhost"), port: 8086 }
    INFO conduit_proxy routing on V4(127.0.0.1:4140)
    INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
    INFO conduit_proxy::transport::connect "controller-client", DNS resolved DnsName("localhost") to 127.0.0.1
    ERR! conduit_proxy::map_err turning service error into 500: Inner(Upstream(Inner(Inner(Error { kind: Inner(Error { kind: Proto(INTERNAL_ERROR) }) }))))
    WARN conduit_proxy::control::telemetry "controller-client", controller error: Grpc(Status { code: Unavailable })
    ERR! conduit_proxy::map_err turning service error into 500: Inner(Upstream(Inner(Inner(Error { kind: Inner(Error { kind: Proto(INTERNAL_ERROR) }) }))))
    ERR! conduit_proxy::map_err turning service error into 500: Inner(Upstream(Inner(Inner(Error { kind: Inner(Error { kind: Proto(INTERNAL_ERROR) }) }))))
    WARN conduit_proxy::control::telemetry "controller-client", controller error: Grpc(Status { code: Unavailable })
    ERR! conduit_proxy::map_err turning service error into 500: Inner(Upstream(Inner(Inner(Error { kind: Inner(Error { kind: Proto(INTERNAL_ERROR) }) }))))
    ERR! conduit_proxy::map_err turning service error into 500: Inner(Upstream(Inner(Inner(Error { kind: Inner(Error { kind: Proto(INTERNAL_ERROR) }) }))))
    WARN conduit_proxy::control::telemetry "controller-client", controller error: Grpc(Status { code: Unavailable })
    ERR! conduit_proxy::map_err turning service error into 500: Inner(Upstream(Inner(Inner(Error { kind: Inner(Error { kind: Proto(INTERNAL_ERROR) }) }))))
    WARN conduit_proxy::control::telemetry "controller-client", controller error: Grpc(Status { code: Unknown })
    

    If I had to guess, I think these errors are likely a result of the go processes trying to route traffic before the proxy has initialized, and this behavior changed in #365.

    Here's the version I'm testing against:

    $ ./bin/conduit version
    Client version: git-9ffe8b79
    Server version: git-9ffe8b79
    
    area/controller priority/P0 
    opened by klingerf 37
  • Linkerd 2.4.0: request aborted because it reached the configured dispatch deadline

    Linkerd 2.4.0: request aborted because it reached the configured dispatch deadline

    Bug Report

    What is the issue?

    All deployments where linkerd was injected errored out at the same time with:

    proxy={server=out listen=127.0.0.1:4140 remote=10.10.45.84:41846} linkerd2_proxy::app::errors request aborted because it reached the configured dispatch deadline
    

    How can it be reproduced?

    I do not really know, run 2.4.0 in a environment where there is a lot of container churn and only some deployments are meshed. It took around 12 hours of runtime (constant traffic) to cause the error.

    Logs, error output, etc

    There is the mentioned log message in all deployments. Otherwise there is nothing.

    linkerd check output

    kubernetes-api
    --------------
    √ can initialize the client
    √ can query the Kubernetes API
    
    kubernetes-version
    ------------------
    √ is running the minimum Kubernetes API version
    √ is running the minimum kubectl version
    
    linkerd-config
    --------------
    √ control plane Namespace exists
    √ control plane ClusterRoles exist
    √ control plane ClusterRoleBindings exist
    √ control plane ServiceAccounts exist
    √ control plane CustomResourceDefinitions exist
    √ control plane MutatingWebhookConfigurations exist
    √ control plane ValidatingWebhookConfigurations exist
    √ control plane PodSecurityPolicies exist
    
    linkerd-existence
    -----------------
    √ 'linkerd-config' config map exists
    √ control plane replica sets are ready
    √ no unschedulable pods
    √ controller pod is running
    √ can initialize the client
    √ can query the control plane API
    
    linkerd-api
    -----------
    √ control plane pods are ready
    √ control plane self-check
    √ [kubernetes] control plane can talk to Kubernetes
    √ [prometheus] control plane can talk to Prometheus
    √ no invalid service profiles
    
    linkerd-version
    ---------------
    √ can determine the latest version
    √ cli is up-to-date
    
    control-plane-version
    ---------------------
    √ control plane is up-to-date
    √ control plane and cli versions match
    
    Status check results are √
    

    Environment

    • Kubernetes Version: 1.14.1
    • Cluster Environment: self managed
    • Host OS: CoreOs 2079.3.0
    • Linkerd version: 2.4.0

    Possible solution

    Additional context

    priority/P0 bug needs/repro 
    opened by bjoernhaeuser 35
  • High memory usage on a destination controller

    High memory usage on a destination controller

    Bug Report

    What is the issue?

    The communication from linkerd injected pods to other linkerd injected pods suddenly stops. Restarting the control plane resolves it instantly.

    area/proxy bug 
    opened by kforsthoevel 35
  • linkerd-control-plane chart: support configuring PodMonitor labels

    linkerd-control-plane chart: support configuring PodMonitor labels

    What problem are you trying to solve?

    With podMonitor.enabled, the resulting PodMonitors without the release label aren't able to be selected by kube-prometheus-stack.

    How should the problem be solved?

    Support configuring PodMonitor labels, like by value podMonitor.extraLabels.

    Any alternatives you've considered?

    FluxCD HelmRelease postRenderers.

    How would users interact with this feature?

    No response

    Would you like to work on this feature?

    None

    enhancement 
    opened by uqix 1
  •  Connection header illegal in HTTP/2: connection

    Connection header illegal in HTTP/2: connection

    What is the issue?

    Some of our production services are dropping requests with:

    inbound:server{port=<port>}: hyper::proto::h2: Connection header illegal in HTTP/2: connection
    

    and on the outbound / source pods we see 504s along with:

    linkerd_app_outbound::http::proxy_connection_close: Closing application connection for remote proxy error=connect timed out after 100ms
    

    As I understand it, this means the proxy is receiving a request with the a connection header set, which is invalid in H2. The only real reference I can see to this is here: https://github.com/linkerd/linkerd2/issues/1407 which suggests the proxy is supposed to drop headers that are invalid rather than failing the request.

    As it stands, we are dropping production requests between two services, I know the source of the traffic thanks to Linkerd logging out the client IP, but it doesn't log the failed request so I don't know what is happening between the services to try and fix it. Debug logs and tap do not show it either.

    Ideally proxy would not fail these requests, and just strip out the illegal headers. If it has to fail them, more information to help identify the requests would be great, such as logging the request headers / body.

    How can it be reproduced?

    Will confirm this. I suspect sending any http/1 request with a connection header to a meshed pod will trigger it.

    Logs, error output, etc

    INFO ThreadId(02) inbound:server{port=8080}:rescue{client.addr=10.133.71.3:59588}: linkerd_app_core::errors::respond: Request failed error=error trying to connect: connect timed out after 100ms error.sources=[connect timed out after 100ms]
    
    WARN ThreadId(02) inbound:server{port=8080}: hyper::proto::h2: Connection header illegal in HTTP/2: connection
    

    output of linkerd check -o short

    Status check results are √

    Environment

    Kubernetes - v1.22.15-gke.100 Linkerd - 2.12.2

    Possible solution

    No response

    Additional context

    No response

    Would you like to work on fixing this bug?

    None

    bug 
    opened by dwilliams782 0
  • linkerd-control-plane issuer are not able to reference an existing secret containing certificate and keys required for linkerd deployment

    linkerd-control-plane issuer are not able to reference an existing secret containing certificate and keys required for linkerd deployment

    What problem are you trying to solve?

    Initially deploying linkerd with helm chart with IaC (Infrastructure as Code) in mind requires the individual to copy paste raw certificates and key to a .yaml file.

    How should the problem be solved?

    Make the deployment able to reference an external secret.

    Any alternatives you've considered?

    No

    How would users interact with this feature?

    Pass an argument for the deployments to reference an existing secret containing crt.pem and key.pem

    Would you like to work on this feature?

    yes

    enhancement 
    opened by mainey 0
  • convert ServerAuthorizations to AuthorizationPolicies

    convert ServerAuthorizations to AuthorizationPolicies

    The Linkerd extension charts use ServerAuthorization resources. AuthorizationPolicies are now the recommended resource to use in favor of ServerAuthorizations. We replace all of the ServerAuthorization resources in the Linkerd extension charts with AuthorizationPolicy resources.

    Signed-off-by: Alex Leong [email protected]

    opened by adleong 0
  • Expose configuration for outbound proxy cache eviction timeout

    Expose configuration for outbound proxy cache eviction timeout

    opened by jeremychase 1
Releases(stable-2.12.3)
Owner
Linkerd
A service mesh for Kubernetes and beyond. Cloud Native Computing Foundation (cncf.io) project.
Linkerd
Wait Service is a pure rust program to test and wait on the availability of a service.

Wait Service Wait Service is a pure rust program to test and wait on the availability of a service.

Magic Len (Ron Li) 3 Jan 18, 2022
Detects orphan configmaps and secrets in a Kubernetes cluster

KubExplorer Warning: Proof of concept. Feedback is much welcome. Discovers and prints out any Configmaps and Secrets not linked to any of the followin

Pavel Pscheidl 56 Oct 21, 2022
A crate to implement leader election for Kubernetes workloads in Rust.

Kubernetes Leader Election in Rust This library provides simple leader election for Kubernetes workloads.

Hendrik Maus 33 Dec 29, 2022
πŸ’« Small microservice to handle state changes of Kubernetes pods and post them to Instatus or Statuspages

?? Kanata Small microservice to handle state changes of Kubernetes pods and post to Instatus ?? Why? I don't really want to implement and repeat code

Noel Κ• β€’α΄₯β€’Κ” 4 Mar 4, 2022
kubernetes openapi unmangler

kopium A kubernetes openapi unmangler. Creates rust structs from a named crd by converting the live openapi schema. ⚠️ WARNING: ALPHA SOFTWARE ⚠️ Inst

kube-rs 48 Jan 3, 2023
Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications Explore PipeCD docs Β» Overview PipeCD provides a unified co

PipeCD 650 Dec 29, 2022
engula-operator creates/configures/manages engula clusters atop Kubernetes

Engula Operator The engula operator manages engula clusters deployed to Kubernetes and automates tasks related to operating an engula cluster. Backgro

ε°ζ―η‰›ει£žζœΊ 12 Apr 27, 2022
Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator

Northstar Northstar is a horizontally scalable and multi-tenant Kubernetes cluster provisioner and orchestrator. Explore the docs Β» View Demo Β· Report

Lucas Clerisse 1 Jan 22, 2022
Rust Kubernetes runtime helpers. Based on kube-rs.

kubert Rust Kubernetes runtime helpers. Based on kube-rs. Features clap command-line interface support; A basic admin server with /ready and /live pro

Oliver Gould 63 Dec 17, 2022
The last kubernetes tool you'll ever need.

Neatkube The last kubernetes tool you'll ever need. Kubernetes is a mess. Everthing ships it's own command line tools that you need to install and tra

git repositories with lazers 5 Aug 3, 2022
Kubernetes + wasmCloud

KasmCloud Managing and Running Actors, Providers, and Links in Kubernetes ⚠️ Warning This is a contributor-led experimental project and is not recomme

wasmcloud 22 Oct 8, 2023
KFtray - A tray application that manages port forwarding in Kubernetes.

Ktray is written in Rust and React, with Tauri framework. The app simplifies the process of starting and stopping multiple port forwarding configurations through a user-friendly interface.

Henrique Cavarsan 42 Dec 17, 2023
A Kubernetes Operator that uses Bitwarden to provision secrets, written in Rust with kube-rs

bitwarden-secret-operator-rs bitwarden-secret-operator-rs is a kubernetes Operator written in Rust thanks to kube-rs. The goal is to create Kubernetes

Blowa 4 Mar 28, 2024
A small utility for tracking the change in opening and closing of issues in a GitHub repo

A small utility for tracking the change in opening and closing of issues in a GitHub repo. This tool can be used to build visualizations for issue triage over time with the hope of motivating closing more issues than are opened.

Ryan Levick 12 Sep 29, 2021
Repo for Monaco, a DCA engine for Solana. Built on Solend and lending protocols (Jet, Solend, Port, etc...)

Monaco Monaco is a DCA protocol for solana built on top of Serum and compatible with any program that implements or extends the instruction interface

Rohan Kapur 19 Apr 13, 2022
This repo is a sample video search app using AWS services.

Video Search This repo is a sample video search app using AWS services. You can check the demo on this link. Features Transcribing Video and generate

AWS Samples 8 Jan 5, 2023
This repo scans pypi, rubygems and hexpm for AWS keys

What is this? This is a project to try to detect any AWS access keys that are accidentally uploaded to the Python Package Index (PyPi). New uploads ar

Tom Forbes 80 Jan 28, 2023
Github mirror of codeberg repo. Monitor live bandwidth usage/ network speed on PC. Native version also available for Android, separately.

Netspeed Monitor Netspeed is a cross-platform desktop application that shows the live upload speed, download speed and day's usage as an overlay. Feat

Vishnu N K 16 May 3, 2023
A artifact repo written in rust ! (still in develop)

Rstore A Rust-based artifact repository (still in development) usage name http frame axum orm disel s3 aws-sdk-s3 log log4rs config config-rs http cli

Rambler 3 May 8, 2023