Elevated `badNonce` and `malformed` errors when trying to issue certificates #17

Closed
by Ghost opened 5 years ago · 10 comments
Ghost commented 5 years ago

I'm seeing lots of intermittent errors while issuing certificates since I upgraded my service to greenlock from letsencrypt. About 90% of the errors are badNonce, and the rest are malformed. Retrying a few times usually results in getting a certificate issued.

Looks like this is related to #7.

We are on node v10.15.0 using

  • Greenlock v2.6.7
  • acme-v2 v1.5.2
  • rsa-compat v1.9.2

All of them are the latest versions, except rsa-compat, but looking at the changes, don't see anything that would fix the problem by bumping to v2.0.2. In any case esa-compat@1.9.2 is a dependency of acme-v2@1.5.2, so can't really bump it till there is a new acme-v2 release.

badNonce error trace -

Error: [acme-v2.js] authorizations were not fetched for 'doc.mail.freenet.de':
{"type":"urn:ietf:params:acme:error:badNonce","detail":"JWS has an invalid anti-replay nonce: \"{{nonce}}\"","status":400}
    at /var/www-api/node_modules/acme-v2/node.js:620:33
    at process._tickCallback (internal/process/next_tick.js:68:7)

malformed error trace -

Didn't finalize order: Unhandled status '400'. This is not one of the known statuses...
Requested: '{{domain}}'
Validated: '{{domain}}'
{
  "type": "urn:ietf:params:acme:error:malformed",
  "detail": "Order's status (\"processing\") is not acceptable for finalization",
  "status": 400
}

Please open an issue at https://git.coolaj86.com/coolaj86/acme-v2.js
I'm seeing lots of intermittent errors while issuing certificates since I upgraded my service to greenlock from letsencrypt. About 90% of the errors are `badNonce`, and the rest are `malformed`. Retrying a few times usually results in getting a certificate issued. Looks like this is related to #7. We are on node v10.15.0 using - Greenlock v2.6.7 - acme-v2 v1.5.2 - rsa-compat v1.9.2 All of them are the latest versions, except `rsa-compat`, but looking at the changes, don't see anything that would fix the problem by bumping to v2.0.2. In any case esa-compat@1.9.2 is a dependency of acme-v2@1.5.2, so can't really bump it till there is a new `acme-v2` release. `badNonce` error trace - ``` Error: [acme-v2.js] authorizations were not fetched for 'doc.mail.freenet.de': {"type":"urn:ietf:params:acme:error:badNonce","detail":"JWS has an invalid anti-replay nonce: \"{{nonce}}\"","status":400} at /var/www-api/node_modules/acme-v2/node.js:620:33 at process._tickCallback (internal/process/next_tick.js:68:7) ``` `malformed` error trace - ``` Didn't finalize order: Unhandled status '400'. This is not one of the known statuses... Requested: '{{domain}}' Validated: '{{domain}}' { "type": "urn:ietf:params:acme:error:malformed", "detail": "Order's status (\"processing\") is not acceptable for finalization", "status": 400 } Please open an issue at https://git.coolaj86.com/coolaj86/acme-v2.js ```

Hi,

Is there anything I can do to help debug this?

Hi, Is there anything I can do to help debug this?

According to the Lets Encrypt folks, one of the reasons for getting a badNonce error is either requests are made from a different IP, or the SSL connection is not reused -

dehydrated is a shell script that runs a separate curl command for each request, which means each request is made on a new connection, which means you get assigned a new source IP address. Most likely a client that reuses an HTTPS connection over multiple requests would have much less trouble. That rules out shell-based clients, but other clients should work reasonably well.

Looking at the code for urequest, it looks like it uses the default https global agent, and does not set keepAlive to true. I tried to set the default to true, but that didn't seem to help.

@coolaj86 any ideas on what I might be missing?

According to the Lets Encrypt folks, one of the reasons for getting a badNonce error is either requests are made from a different IP, [or the SSL connection is not reused](https://community.letsencrypt.org/t/jws-has-an-invalid-anti-replay-nonce-when-client-behind-nat/66493/3?u=elssar) - >dehydrated is a shell script that runs a separate curl command for each request, **which means each request is made on a new connection, which means you get assigned a new source IP address**. Most likely a client that reuses an HTTPS connection over multiple requests would have much less trouble. That rules out shell-based clients, but other clients should work reasonably well. Looking at the code for `urequest`, it looks like it uses the default https global agent, and does not set keepAlive to `true`. I tried to set the default to `true`, but that didn't seem to help. @coolaj86 any ideas on what I might be missing?
Owner

@elssar Thanks so much for this!

That makes a lot of sense, but I never would have guessed.

I'll take a look at at urequest and make sure that

  1. agent options can passed in
  2. the correct agent options are used by acme-v2
@elssar Thanks so much for this! That makes a lot of sense, but I never would have guessed. I'll take a look at at urequest and make sure that 1. agent options can passed in 2. the correct agent options are used by acme-v2
Owner

Node.js maintains several connections per server to make HTTP requests. This function allows one to transparently issue requests.

That may explain why even the default agent is having this issue.

I'm about to push a change to urequest that will allow agent to be passed in. I think that creating an agent with only one request per server may help. We shall see.

> Node.js maintains several connections per server to make HTTP requests. This function allows one to transparently issue requests. That may explain why even the default agent is having this issue. I'm about to push a change to urequest that will allow `agent` to be passed in. I think that creating an agent with only one request per server may help. We shall see.
Owner

I've been doing a number of things tonight. I did modify urequest@1.3.7 to respect agent when passed in. I'll try to test the rest tomorrow, but feel free to try it and let me know.

Also, I don't get these errors myself. If you come up with a way to reproduce it that would be awesome.

I've been doing a number of things tonight. I did modify `urequest@1.3.7` to respect `agent` when passed in. I'll try to test the rest tomorrow, but feel free to try it and let me know. Also, I don't get these errors myself. If you come up with a way to reproduce it that would be awesome.

@coolaj86 I haven't been able to reproduce this outside of my production environment, but I think I have an idea how to do it. Will let you know I am successful.

@coolaj86 I haven't been able to reproduce this outside of my production environment, but I think I have an idea how to do it. Will let you know I am successful.
Owner

Any updates?

I'd be interested to know if this still happens in Greenlock v2.7+.

I made a bunch of updates, added a bunch of tests, and got some corner cases taken care of.

I don't think anything that I did would directly affect this but, aside from the backwards compatibility shims which became more complex, the core flow of the code is a lot simpler now.

Any updates? I'd be interested to know if this still happens in Greenlock v2.7+. I made a bunch of updates, added a bunch of tests, and got some corner cases taken care of. I don't think anything that I did would directly affect this but, aside from the backwards compatibility shims which became more complex, the core flow of the code is a lot simpler now.

@coolaj86, sorry I couldn't reproduce this in my beta environment, and since this was causing problems in production we moved back to the letsencrypt module. Will revisit this in the next couple of weeks.

@coolaj86, sorry I couldn't reproduce this in my beta environment, and since this was causing problems in production we moved back to the letsencrypt module. Will revisit this in the next couple of weeks.
Owner

We now offer business support at an hourly rate, so if you're interested in pairing together for an hour to look at specifics send a message to aj@therootcompany.com and we can schedule some time to take a look.

We now offer business support at an hourly rate, so if you're interested in pairing together for an hour to look at specifics send a message to aj@therootcompany.com and we can schedule some time to take a look.
Owner

Fixed in v3

Fixed in v3
coolaj86 closed this issue 4 years ago
Sign in to join this conversation.
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.