Dry-run self diagnostics question #28

New Issue

Ghost · 2018-12-19T16:21:24Z

Ghost commented

2018-12-19 16:21:24 +00:00

Hey there,

I'm working with greenlock behind a load-balanced configuration that ultimately has multiple A records (This is an implementation detail of the AWS services I'm using). Only one of these IPs is configured to handle cert issuance on port 80.

I'm running into an interesting issue where it seems before the requests to let's encrypt begin, it's failing the built-in self diagnostics referenced in the docs. From what I can tell, greenlock is requesting itself by IP address, not domain name, and failing intermittently due what I assume is related to DNS round-robin.

Let me know if you can provide any insight around this issue 😃

Hey there, I'm working with greenlock behind a load-balanced configuration that ultimately has multiple A records (This is an implementation detail of the AWS services I'm using). Only one of these IPs is configured to handle cert issuance on port 80. I'm running into an interesting issue where it seems before the requests to let's encrypt begin, it's failing the built-in self diagnostics referenced in the docs. From what I can tell, greenlock is requesting itself by IP address, not domain name, and failing intermittently due what I assume is related to DNS round-robin. Let me know if you can provide any insight around this issue :smiley:

coolaj86 commented

2018-12-19 16:46:06 +00:00

Ah, but if I dig up the option to turn that check off (which does exist and I’ll cconsider putting in the readme), then when the request from Let’s Encrypt come in you’ll fail the challenge.

I’ll double check, but I believe the Host header has the correct hostname. It’s just plain http though, not https, at that point.

Also, is you load balancer doing SNI based routing in the first place? Otherwise the encrypted requests will fail even if the challenge succeeded.

Instead of circumventing the check (because it’s probably doing the right thing), you’ll probably want to use the AWS route53 dns challenge plugin as seen at https://git.coolaj86.com/coolaj86/greenlock-express.js and I think there’s an AWS S3 plug-in got certificate storage as well.

I know I’m being presumptuous by essentially saying “you’re wrong, the tool is right”, but it’s only because this is a common question and usually the person asking it is missing some context.

That said, if you know what you’re doing and you’re sure let me know and I’ll dig up the option you’re looking for.

Ah, but if I dig up the option to turn that check off (which does exist and I’ll cconsider putting in the readme), then when the request from Let’s Encrypt come in you’ll fail the challenge. I’ll double check, but I believe the Host header has the correct hostname. It’s just plain http though, not https, at that point. Also, is you load balancer doing SNI based routing in the first place? Otherwise the encrypted requests will fail even if the challenge succeeded. Instead of circumventing the check (because it’s probably doing the right thing), you’ll probably want to use the AWS route53 dns challenge plugin as seen at https://git.coolaj86.com/coolaj86/greenlock-express.js and I think there’s an AWS S3 plug-in got certificate storage as well. I know I’m being presumptuous by essentially saying “you’re wrong, the tool is right”, but it’s only because this is a common question and usually the person asking it is missing some context. That said, if you know what you’re doing and you’re sure let me know and I’ll dig up the option you’re looking for.

Ghost commented

2018-12-19 17:10:51 +00:00

My configuration is a bit unique, I'm using the greenlock.register API coupled with greenlock.middleware in express to dynamically issue certs. The node app is deployed to a fargate cluster and load balanced behind a network load balancer that is configured to route traffic correctly (port 80 listener routes to a service for handling cert issuance/challenges and redirecting to https, the port 443 listener routes to the actual application server that terminates the SSL).

Under the hood, NLB's provision multiple load balancer nodes which results in multiple A records for an NLB host. Normally, if the request comes in via the domain name, AWS knows how to resolve things. If the request is made directly to an IP, there's no guarantee it's going to hit the correct LB node.

Unfortunately, for my use case DNS challenge will not work -- i need to use the HTTP challenge.

Just for clarification, when you say "the encrypted requests will fail", are you referring to inbound HTTPS from let's encrypt that occur as a result of the certificate issuance process, or HTTPS requests subsequent to installing a cert?

I was operating under the assumption that any inbound requests from lets encrypt would occur unencrypted on port 80.

I took a look at the source for your acme-v2 package but it was unclear exactly how the dry-run request was made -- but my server logs indicate it's using an IP address

My configuration is a bit unique, I'm using the `greenlock.register` API coupled with `greenlock.middleware` in express to dynamically issue certs. The node app is deployed to a fargate cluster and load balanced behind a network load balancer that is configured to route traffic correctly (port 80 listener routes to a service for handling cert issuance/challenges and redirecting to https, the port 443 listener routes to the actual application server that terminates the SSL). Under the hood, NLB's provision multiple load balancer nodes which results in multiple A records for an NLB host. Normally, if the request comes in via the domain name, AWS knows how to resolve things. If the request is made directly to an IP, there's no guarantee it's going to hit the correct LB node. Unfortunately, for my use case DNS challenge will not work -- i need to use the HTTP challenge. Just for clarification, when you say "the encrypted requests will fail", are you referring to inbound HTTPS from let's encrypt that occur as a result of the certificate issuance process, or HTTPS requests subsequent to installing a cert? I was operating under the assumption that any inbound requests from lets encrypt would occur unencrypted on port 80. I took a look at the source for your acme-v2 package but it was unclear exactly how the dry-run request was made -- but my server logs indicate it's using an IP address

coolaj86 commented

2018-12-19 17:22:36 +00:00

Give me a few hours to look into it and get back to you. Server logs don’t lie... unless they do... but it sounds like the problem is on my end so I’ll poke a bit.

coolaj86 commented

2018-12-19 17:23:23 +00:00

(a couple hours because I’m making an hour-long trip right now)

Ghost commented

2018-12-19 17:24:51 +00:00

Thank you, much appreciated! Let me know if there's any other information I can provide you with or help debug in any way.

coolaj86 commented

2018-12-20 01:05:17 +00:00

Are you sure that it isn't the tool you're using to initiate the request that's sending the IP address instead of the hostname (i.e. software on the load balancer dropping the Host header)?

I double checked the source of acme-v2 and urequest and they're both doing the right thing. I'm coming up with a mini test case to triple check.

Are you sure that it isn't the tool you're using to initiate the request that's sending the IP address instead of the hostname (i.e. software on the load balancer dropping the Host header)? I double checked the source of acme-v2 and urequest and they're both doing the right thing. I'm coming up with a mini test case to triple check.

coolaj86 commented

2018-12-20 03:07:50 +00:00

I just took a minute to triple check and it does send the host header as configured.

What does your package-lock.json look like? Do you have any versions that are older than the following?

greenlock v2.5
acme-v2 v1.3.1
@coolaj86/urequest v1.3.6

I just took a minute to triple check and it does send the host header as configured. What does your `package-lock.json` look like? Do you have any versions that are older than the following? * greenlock v2.5 * acme-v2 v1.3.1 * @coolaj86/urequest v1.3.6

Ghost commented

2018-12-20 03:42:05 +00:00

yep, running identical versions to what you have listed. I suppose it's possible the host header is being dropped somewhere. I will look into that.

So just to be clear, the dry-run basically makes a request to the provided domain in an attempt to validate the challenge will pass?

Also... i've attached a screenshot of what I'm seeing in the logs. Maybe it'll help shed some light on things. From this, it looks like the Host header is set as a raw IP address instead of the actual domain name which breaks routing stuff in AWS

yep, running identical versions to what you have listed. I suppose it's possible the host header is being dropped somewhere. I will look into that. So just to be clear, the dry-run basically makes a request to the provided domain in an attempt to validate the challenge will pass? Also... i've attached a screenshot of what I'm seeing in the logs. Maybe it'll help shed some light on things. From this, it looks like the Host header is set as a raw IP address instead of the actual domain name which breaks routing stuff in AWS

Screen Shot 2018-12-18 at 11.21.04 PM.png

46 KiB

Ghost commented

2018-12-20 22:47:01 +00:00

@coolaj86 after much code-spelunking through greenlock, acme-v2 and urequest i think i've figured out what's going on!

As it turns out, this appears to be an issue with how the nodeJS built-in HTTP module handles DNS resolution. If DNS resolution yields multiple A records, it will only attempt the first IP address (where as most other HTTP clients with attempt all in parallel). Since your urequest module implements the native HTTP module, this explains my ETIMEDOUT error.

NodeJS issue documenting this behavior: https://github.com/nodejs/node/issues/708

Still in search of a solution, but perhaps this can be useful to anyone else who may encounter the same problem.

@coolaj86 after much code-spelunking through greenlock, acme-v2 and urequest i think i've figured out what's going on! As it turns out, this appears to be an issue with how the nodeJS built-in HTTP module handles DNS resolution. If DNS resolution yields multiple A records, it will only attempt the first IP address (where as most other HTTP clients with attempt all in parallel). Since your urequest module implements the native HTTP module, this explains my ETIMEDOUT error. NodeJS issue documenting this behavior: https://github.com/nodejs/node/issues/708 Still in search of a solution, but perhaps this can be useful to anyone else who may encounter the same problem.

coolaj86 commented

2018-12-21 03:45:38 +00:00

That doesn't make sense.

How do you expect Let's Encrypt to pass if your DNS records are wrong (pointing to computers that are dropping TCP packets destined to port 80) and you're using HTTP validation?

HTTP clients typically do not attempt all DNS addresses in parallel. However, they often do failover when one doesn't respond.

That said, Let's Encrypt is going to (sometimes) get the wrong IP address the same as node.

I think the real question is why do you have invalid IP addresses (or correct addresses to computers that drop TCP:80 packets) in your DNS records?

Nevertheless, the option is skipChallengeTest: true. When you pass that to greenlock I believe it propagates the whole way through down to acme-v2.

That doesn't make sense. How do you expect Let's Encrypt to pass if your DNS records are wrong (pointing to computers that are dropping TCP packets destined to port 80) and you're using HTTP validation? HTTP clients typically *do not* attempt all DNS addresses in parallel. However, they often do failover when one doesn't respond. That said, Let's Encrypt is going to (sometimes) get the wrong IP address the same as node. I think the real question is why do you have invalid IP addresses (or correct addresses to computers that drop TCP:80 packets) in your DNS records? Nevertheless, the option is `skipChallengeTest: true`. When you pass that to greenlock I believe it propagates the whole way through down to `acme-v2`.

coolaj86 commented

2018-12-21 03:48:11 +00:00

On the flip side, there's this thing called "hairpin routing" that is easy to get misconfigured which would also cause valid IP addresses to get rejected when the requests come from inside the network.

I'd be curious to do a little testing if you'd like to send your domain name and IP address to me coolaj86@gmail.com.

On the flip side, there's this thing called "hairpin routing" that is easy to get misconfigured which would also cause valid IP addresses to get rejected when the requests come from inside the network. I'd be curious to do a little testing if you'd like to send your domain name and IP address to me coolaj86@gmail.com.

Ghost commented

2018-12-21 04:23:02 +00:00

Regardless if they attempt the requests to an IP in parallel or via a failover, the nodeJS HTTP module does neither -- therefore the dry-run is guaranteed to fail intermittently. It's totally possible lets encrypt's HTTP implementation could behave in the same way, I just haven't gotten that far yet.

I'm not familiar with hairpin routing, but I'll dig into that as well and see what it turns up.

My DNS records are not wrong per se, AWS is just creating multiple A records for my load balancers host name. For example, when you create a load balancer it gives you something like my-load-balancer123.amazonaws.com, which you can point your own domain at with a CNAME. As far as I can tell, as I add listeners to the load balancer (443 and 80), it provisions compute nodes under the hood for each listener and creates an A record on the my-loadbalancer123 host to route to them.

When I get the ETIMEDOUT error, it's because the request was attempted on the underlying LB node that's listening on port 443, not port 80. It seems like AWS is just assuming all HTTP clients will implement some kind of failover behavior.

Unfortunately, I don't think I have any control over this implementation detail. One option i've experimented with is to attach an elastic IP to the LB which allows me to point my domain with an A record instead of the CNAME and ensure all requests route through a single IP -- this solves my problem (although I lose the ability to have automatic failover over multiple AZs but that's different story...).

I don't think this is a greenlock issue at all. But thank you for taking the time to work through this with me and all your work on greenlock!

Regardless if they attempt the requests to an IP in parallel or via a failover, the nodeJS HTTP module does neither -- therefore the dry-run is guaranteed to fail intermittently. It's totally possible lets encrypt's HTTP implementation could behave in the same way, I just haven't gotten that far yet. I'm not familiar with hairpin routing, but I'll dig into that as well and see what it turns up. My DNS records are not wrong per se, AWS is just creating multiple A records for my load balancers host name. For example, when you create a load balancer it gives you something like my-load-balancer123.amazonaws.com, which you can point your own domain at with a CNAME. As far as I can tell, as I add listeners to the load balancer (443 and 80), it provisions compute nodes under the hood for each listener and creates an A record on the my-loadbalancer123 host to route to them. When I get the ETIMEDOUT error, it's because the request was attempted on the underlying LB node that's listening on port 443, not port 80. It seems like AWS is just assuming all HTTP clients will implement some kind of failover behavior. Unfortunately, I don't think I have any control over this implementation detail. One option i've experimented with is to attach an elastic IP to the LB which allows me to point my domain with an A record instead of the CNAME and ensure all requests route through a single IP -- this solves my problem (although I lose the ability to have automatic failover over multiple AZs but that's different story...). I don't think this is a greenlock issue at all. But thank you for taking the time to work through this with me and all your work on greenlock!

coolaj86 commented

2018-12-21 09:10:56 +00:00

Why not use something simple and easy, like DigitalOcean, Vultr, or Linode?

coolaj86 commented

2018-12-21 09:18:13 +00:00

Recently it seems like everyone I know and their dog is trying to use AWS to solve every problem. I even see junior devs trying to use it. It completely baffles me (though I'm from a bygone era - an old man who yells at the cloud, as it were).

Unless you're an expert at multi-million dollar devops deployments and you're servicing a massive enterprise that has to have complex global deployments, it's probably not worth the 10x-100x Amazon cost premium, or the education to learn their proprietary ways of doing things.

AWS makes complicated things possible, but it makes simple things very complicated.

This all coming from a guy who is on the complete opposite end of the spectrum - writing DNS and TLS tools in JavaScript for IoT devices... so take that for what it's worth...

Recently it seems like _everyone_ I know and their dog is trying to use AWS to solve _every_ problem. I even see junior devs trying to use it. It completely baffles me (though I'm from a bygone era - an old man who yells at _the cloud_, as it were). Unless you're an expert at multi-million dollar devops deployments and you're servicing a massive enterprise that has to have complex global deployments, it's probably not worth the 10x-100x Amazon cost premium, or the education to learn their proprietary ways of doing things. AWS makes complicated things possible, but it makes simple things very complicated. This all coming from a guy who is on the complete opposite end of the spectrum - writing DNS and TLS tools in JavaScript for IoT devices... so take that for what it's worth...

Ghost commented

2018-12-22 15:49:19 +00:00

I would always prefer simple and easy, unfortunately this project warrants a heavier-handed approach. I've actually found ECS/Fargate to be a pretty good experience once you figure out all the jargon. It's a big upfront investment in terms of learning curve and configuration... but once it's set up you end up with an infinitely scalable, highly available, and durable app. Pretty incredible stuff

Thanks again for all your work on greenlock -- i'm hoping to find some time to contribute some additional docs for the storage and challenge plugin APIs.

I would always prefer simple and easy, unfortunately this project warrants a heavier-handed approach. I've actually found ECS/Fargate to be a pretty good experience once you figure out all the jargon. It's a big upfront investment in terms of learning curve and configuration... but once it's set up you end up with an infinitely scalable, highly available, and durable app. Pretty incredible stuff Thanks again for all your work on greenlock -- i'm hoping to find some time to contribute some additional docs for the storage and challenge plugin APIs.

coolaj86 commented

2018-12-22 21:37:31 +00:00

Ah, I see.

Well, you're welcome and that would be a great contribution.

Ah, I see. Well, you're welcome and that would be a great contribution.

coolaj86 closed this issue

2019-02-05 05:10:59 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: coolaj86/greenlock.js-ARCHIVED#28