Running Cheap gRPC Web with Docker Containers on ECS and EC2
Published on 1 March 2023This article summarizes my quest to stubbornly run a cheap Dockerized gRPC server capable of interacting with gRPC Web. I was trying to set up a staging instance (not a production instance) for my startup.
In this article, you can learn one of two things:
- How to deploy Docker containers exposed to the internet for cheap on AWS (price of a micro instance)
- How to deploy a gRPC web server with Envoy on AWS ECS
In this example, we use a TypeScript Node server, but you can generalize it to any gRPC server.
I’m aware there’s others, but I have AWS credits, and for production, I do, for better or worse, have brand loyalty to AWS. Eventually, I’d need to figure out how to deploy there.
Background
I won’t be explaining from scratch, but I’ll assume some basic knowledge of what gRPC, ECS, EC2, VPCs, and subnets are.
ECS allows you to deploy containers. The service itself is free. However, you can either use AWS Fargate, which is really, really expensive but really convenient, or use an EC2 instance to back the containers. Being cheap and also refusing to back down due to my ego, I decided to forgo the convenience of Fargate and try to see if I could spin up the containers on a micro instance.
To contact a gRPC server from a web browser, it’s required to use Envoy proxy, so it’s not as simple as spinning up a server.
In this article, we’ll be spinning up two containers in one task: the Envoy proxy and the Node gRPC.
Bridge networking vs awsvpc
You can find on the internet more details about these modes of networking. I
spent a long time trying to get awsvpc
to work, but it turns out that you
can’t assign a public IP to ECS
tasks.
This answer gives two ways to do it, but otherwise, there’s no way to easily
expose the container to the internet.
This is possible with a load balancer. You can have the load balancer be exposed to public inbound traffic and then direct it into your VPC. But again – I’m trying to spend, at maximum, $10 a month here. A load balancer is another $10, just for my staging instance!
In bridge networking, you can map host ports to ports that the container listens to. We map the same container port to the same host port in this case for convenience, but this also means that, if your EC2 instance is exposed to the internet at a certain port, then the ECS container is also exposed to the internet.
In our case, our EC2 instance exposes port 443
(through the security rules).
Our Envoy proxy listens on container port 443 (which is the EC2 instance’s port
443). It forwards to the host port 9090 (NOT the Envoy container port, as
explained in the config below), which is listened to by the Node server.
Envoy Dockerfile
The --platform=linux/amd64
is required for M1 Macs.
FROM --platform=linux/amd64 envoyproxy/envoy-dev:e515b02318e55f7f8bcef8db1fde21c1d46990b0
COPY envoy-staging.yaml /etc/envoy/envoy.yaml
RUN chmod go+r /etc/envoy/envoy.yaml
CMD /usr/local/bin/envoy -c /etc/envoy/envoy.yaml
Envoy Config
The key trick in the Envoy file here is the one with the commented line: using
172.17.0.1
in bridge networking allows us to access the
host. The file
envoy-staging.yaml
, referenced by the Dockerfile
above:
# Taken mostly from grpc-web envoy config in hello world demo
# https://github.com/grpc/grpc-web/tree/master/net/grpc/gateway/examples/helloworld
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 443 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
cluster: web_backend
max_stream_duration:
grpc_timeout_header_max: 0s
cors:
allow_origin_string_match:
- prefix: "*"
allow_methods: GET, PUT, DELETE, POST, OPTIONS
allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
max_age: "1728000"
expose_headers: custom-header-1,grpc-status,grpc-message
http_filters:
- name: envoy.filters.http.grpc_web
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
- name: envoy.filters.http.cors
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: web_backend
connect_timeout: 0.25s
type: logical_dns
http2_protocol_options: {}
lb_policy: round_robin
load_assignment:
cluster_name: cluster_0
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
# This assumes bridge networking mode, now we want to
# forward to the host port
# https://stackoverflow.com/a/68021591/892168
address: 172.17.0.1
port_value: 9090
TypeScript gRPC Server Dockerfile
Make sure to replace yarn dockerbuild
with your own build. In dockerbuild
, I
skip the step of regenerating the protobufs due to some particular details with
my directory structure, but you could just run yarn build
here. Also note that
my build directory is build/
, though I think this might be dist/
by default.
Either way, you can find on Google how to build an image for your server.
# https://www.andreadiotallevi.com/blog/how-to-create-a-production-image-for-a-node-typescript-app-using-docker-multi-stage-builds
# TODO: This uses yarn dockerbuild right now, which doesn't update the protos.
# Have to be careful.
FROM --platform=linux/amd64 node:16-alpine AS builder
WORKDIR /app
COPY . .
RUN yarn install
RUN yarn dockerbuild
FROM --platform=linux/amd64 node:16-alpine AS final
WORKDIR /app
COPY --from=builder ./app/build ./build
COPY package.json .
COPY yarn.lock .
RUN yarn install --production
CMD [ "yarn", "start" ]
EC2 Instance
You want to use an ECS compatible AMI. The exact AMI that I’m using is
ami-06502972b2860f143
(amzn2-ami-ecs-hvm-2.0.20230109-x86_64-ebs
), which is
the standard Linux one provided by Amazon Web Services.
It’s important to set the user data here, as this custom setup is essentially the trickiest part of not using Fargate. When you set up the instance, there will be a space for user data, for which you should provide:
#!/bin/bash
echo ECS_CLUSTER=sheets-staging-cluster >> /etc/ecs/ecs.config
echo ECS_LOGLEVEL=debug >> /etc/ecs/ecs.config
echo "ECS_AVAILABLE_LOGGING_DRIVERS=[\"awslogs\",\"json-file\"]" >> /etc/ecs/ecs.config
echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config
Double checking the ECS config
In this section, I’ll give some debugging tips, in case things don’t go in the ideal way.
Be careful with the quoting here. If you’re running into issues related to missing attributes, you should SSH to the instance to check the ECS config:
$ cat /etc/ecs/ecs.config
ECS_CLUSTER=sheets-staging-cluster
ECS_LOGLEVEL=debug
ECS_AVAILABLE_LOGGING_DRIVERS=["awslogs","json-file","syslog"]
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true
You can manually change the config of the instance, but you may need to restart the ECS agent:
sudo systemctl stop ecs && sudo systemctl start ecs
Finally, you can actually check what environmental variables the ecs-agent
has
configured:
docker inspect ecs-agent | grep Env -A 15
ECS Task Definition
Create an ECS cluster in your desired region. Upload your Docker images to Amazon (steps not included in this article). We’re going to create a task definition, and then instantiate a service using it.
You can use the UI to set up something with the ports as you desire. I’ve also included the JSON version of my task definition below. The important things are:
- Two containers: one for Envoy, one for Node
- Automatic log
configuration using
awslogs
as the log driver - Bridge networking mode
- Required ports
The logs are optional, but I highly recommend it, because I (like many people) ran into issues with my containers that might have been different than when running them locally. For example, I was building them on an M1 Mac. Without logging, you won’t be able to see these errors, and your tasks will randomly fail.
It should be noted that the log configuration, at the time of writing, can only be set via UI on the initial creation of the task definition, and if you choose to edit the task later, you need to type it in using the JSON.
Error when setting up logging
It ended up being a huge headache for me, because if you don’t configure the EC2 instance properly, you’ll see an error that says something like:
is missing an attribute required by your task
The internet suggests using the ECS CLI to check attributes, but for me, it said that all attributes were fine. The error message itself, unfortunately, is not very clear. What it really means is that you haven’t allowed the correct log drivers on the EC2 instance.
If you are still seeing this message, then you should double check:
- you’re not using
awsvpc
networking (in which case your security group might be wrong) - that you set up the user data in the EC2 instance correctly above
- the user data declares the allowable log drivers correctly
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE
is set
JSON Task Definition
{
"family": "envoy-node-bridge",
"containerDefinitions": [
{
"name": "sheets-envoy",
"image": "280706339600.dkr.ecr.us-east-2.amazonaws.com/sheets-envoy:dhost",
"cpu": 0,
"portMappings": [
{
"name": "sheets-envoy-443-tcp",
"containerPort": 443,
"hostPort": 443,
"protocol": "tcp",
"appProtocol": "grpc"
}
],
"essential": true,
"environment": [],
"environmentFiles": [],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/envoy-node-bridge",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
}
}
},
{
"name": "sheets-node",
"image": "280706339600.dkr.ecr.us-east-2.amazonaws.com/sheets-node:latest",
"cpu": 0,
"portMappings": [
{
"name": "sheets-node-9090-tcp",
"containerPort": 9090,
"hostPort": 9090,
"protocol": "tcp",
"appProtocol": "http"
}
],
"essential": true,
"environment": [],
"environmentFiles": [],
"mountPoints": [],
"volumesFrom": [],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/envoy-node-bridge",
"awslogs-region": "us-east-2",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"executionRoleArn": "arn:aws:iam::280706339600:role/ecsTaskExecutionRole",
"networkMode": "bridge",
"requiresCompatibilities": [
"EC2"
],
"cpu": "512",
"memory": "1024",
"runtimePlatform": {
"cpuArchitecture": "X86_64",
"operatingSystemFamily": "LINUX"
}
}
Setting up HTTPS
Depending on what frontend you’re using, you may need to set up HTTPS on the EC2 instance. Since I’m still not at the stage of a production deployment, I decided to skip this again by hosting a Next.js website on S3 (without HTTPS). For example, Vercel and Netlify weren’t working for me, because they are too secure, and all their websites have HTTPS, which means they’re not allowed to use HTTP to contact an endpoint.
Conclusion
Documentation in the open source world is not great.