Running Cheap gRPC Web with Docker Containers on ECS and EC2

Published on 1 March 2023

This article summarizes my quest to stubbornly run a cheap Dockerized gRPC server capable of interacting with gRPC Web. I was trying to set up a staging instance (not a production instance) for my startup.

In this article, you can learn one of two things:

  1. How to deploy Docker containers exposed to the internet for cheap on AWS (price of a micro instance)
  2. How to deploy a gRPC web server with Envoy on AWS ECS

In this example, we use a TypeScript Node server, but you can generalize it to any gRPC server.

I’m aware there’s others, but I have AWS credits, and for production, I do, for better or worse, have brand loyalty to AWS. Eventually, I’d need to figure out how to deploy there.

Background

I won’t be explaining from scratch, but I’ll assume some basic knowledge of what gRPC, ECS, EC2, VPCs, and subnets are.

ECS allows you to deploy containers. The service itself is free. However, you can either use AWS Fargate, which is really, really expensive but really convenient, or use an EC2 instance to back the containers. Being cheap and also refusing to back down due to my ego, I decided to forgo the convenience of Fargate and try to see if I could spin up the containers on a micro instance.

To contact a gRPC server from a web browser, it’s required to use Envoy proxy, so it’s not as simple as spinning up a server.

In this article, we’ll be spinning up two containers in one task: the Envoy proxy and the Node gRPC.

Bridge networking vs awsvpc

You can find on the internet more details about these modes of networking. I spent a long time trying to get awsvpc to work, but it turns out that you can’t assign a public IP to ECS tasks. This answer gives two ways to do it, but otherwise, there’s no way to easily expose the container to the internet.

This is possible with a load balancer. You can have the load balancer be exposed to public inbound traffic and then direct it into your VPC. But again – I’m trying to spend, at maximum, $10 a month here. A load balancer is another $10, just for my staging instance!

In bridge networking, you can map host ports to ports that the container listens to. We map the same container port to the same host port in this case for convenience, but this also means that, if your EC2 instance is exposed to the internet at a certain port, then the ECS container is also exposed to the internet.

In our case, our EC2 instance exposes port 443 (through the security rules). Our Envoy proxy listens on container port 443 (which is the EC2 instance’s port 443). It forwards to the host port 9090 (NOT the Envoy container port, as explained in the config below), which is listened to by the Node server.

Envoy Dockerfile

The --platform=linux/amd64 is required for M1 Macs.

FROM --platform=linux/amd64 envoyproxy/envoy-dev:e515b02318e55f7f8bcef8db1fde21c1d46990b0
COPY envoy-staging.yaml /etc/envoy/envoy.yaml
RUN chmod go+r /etc/envoy/envoy.yaml
CMD /usr/local/bin/envoy -c /etc/envoy/envoy.yaml

Envoy Config

The key trick in the Envoy file here is the one with the commented line: using 172.17.0.1 in bridge networking allows us to access the host. The file envoy-staging.yaml, referenced by the Dockerfile above:

# Taken mostly from grpc-web envoy config in hello world demo
# https://github.com/grpc/grpc-web/tree/master/net/grpc/gateway/examples/helloworld
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address: { address: 0.0.0.0, port_value: 443 }
      filter_chains:
        - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              codec_type: auto
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                  - name: local_service
                    domains: ["*"]
                    routes:
                      - match: { prefix: "/" }
                        route:
                          cluster: web_backend
                          max_stream_duration:
                            grpc_timeout_header_max: 0s
                    cors:
                      allow_origin_string_match:
                        - prefix: "*"
                      allow_methods: GET, PUT, DELETE, POST, OPTIONS
                      allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
                      max_age: "1728000"
                      expose_headers: custom-header-1,grpc-status,grpc-message
              http_filters:
                - name: envoy.filters.http.grpc_web
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
                - name: envoy.filters.http.cors
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
                - name: envoy.filters.http.router
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
    - name: web_backend
      connect_timeout: 0.25s
      type: logical_dns
      http2_protocol_options: {}
      lb_policy: round_robin
      load_assignment:
        cluster_name: cluster_0
        endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    # This assumes bridge networking mode, now we want to
                    # forward to the host port
                    # https://stackoverflow.com/a/68021591/892168
                    address: 172.17.0.1
                    port_value: 9090

TypeScript gRPC Server Dockerfile

Make sure to replace yarn dockerbuild with your own build. In dockerbuild, I skip the step of regenerating the protobufs due to some particular details with my directory structure, but you could just run yarn build here. Also note that my build directory is build/, though I think this might be dist/ by default.

Either way, you can find on Google how to build an image for your server.

# https://www.andreadiotallevi.com/blog/how-to-create-a-production-image-for-a-node-typescript-app-using-docker-multi-stage-builds
# TODO: This uses yarn dockerbuild right now, which doesn't update the protos.
# Have to be careful.
FROM --platform=linux/amd64 node:16-alpine AS builder
WORKDIR /app
COPY . .
RUN yarn install
RUN yarn dockerbuild

FROM --platform=linux/amd64 node:16-alpine AS final
WORKDIR /app
COPY --from=builder ./app/build ./build
COPY package.json .
COPY yarn.lock .
RUN yarn install --production
CMD [ "yarn", "start" ]

EC2 Instance

You want to use an ECS compatible AMI. The exact AMI that I’m using is ami-06502972b2860f143 (amzn2-ami-ecs-hvm-2.0.20230109-x86_64-ebs), which is the standard Linux one provided by Amazon Web Services.

It’s important to set the user data here, as this custom setup is essentially the trickiest part of not using Fargate. When you set up the instance, there will be a space for user data, for which you should provide:

#!/bin/bash
echo ECS_CLUSTER=sheets-staging-cluster >> /etc/ecs/ecs.config
echo ECS_LOGLEVEL=debug >> /etc/ecs/ecs.config
echo "ECS_AVAILABLE_LOGGING_DRIVERS=[\"awslogs\",\"json-file\"]" >> /etc/ecs/ecs.config
echo ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true >> /etc/ecs/ecs.config

Double checking the ECS config

In this section, I’ll give some debugging tips, in case things don’t go in the ideal way.

Be careful with the quoting here. If you’re running into issues related to missing attributes, you should SSH to the instance to check the ECS config:

$ cat /etc/ecs/ecs.config
ECS_CLUSTER=sheets-staging-cluster
ECS_LOGLEVEL=debug
ECS_AVAILABLE_LOGGING_DRIVERS=["awslogs","json-file","syslog"]
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true

You can manually change the config of the instance, but you may need to restart the ECS agent:

sudo systemctl stop ecs && sudo systemctl start ecs

Finally, you can actually check what environmental variables the ecs-agent has configured:

docker inspect ecs-agent | grep Env -A 15

ECS Task Definition

Create an ECS cluster in your desired region. Upload your Docker images to Amazon (steps not included in this article). We’re going to create a task definition, and then instantiate a service using it.

You can use the UI to set up something with the ports as you desire. I’ve also included the JSON version of my task definition below. The important things are:

  • Two containers: one for Envoy, one for Node
  • Automatic log configuration using awslogs as the log driver
  • Bridge networking mode
  • Required ports

The logs are optional, but I highly recommend it, because I (like many people) ran into issues with my containers that might have been different than when running them locally. For example, I was building them on an M1 Mac. Without logging, you won’t be able to see these errors, and your tasks will randomly fail.

It should be noted that the log configuration, at the time of writing, can only be set via UI on the initial creation of the task definition, and if you choose to edit the task later, you need to type it in using the JSON.

Error when setting up logging

It ended up being a huge headache for me, because if you don’t configure the EC2 instance properly, you’ll see an error that says something like:

is missing an attribute required by your task

The internet suggests using the ECS CLI to check attributes, but for me, it said that all attributes were fine. The error message itself, unfortunately, is not very clear. What it really means is that you haven’t allowed the correct log drivers on the EC2 instance.

If you are still seeing this message, then you should double check:

  • you’re not using awsvpc networking (in which case your security group might be wrong)
  • that you set up the user data in the EC2 instance correctly above
  • the user data declares the allowable log drivers correctly
  • ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE is set

JSON Task Definition

{
    "family": "envoy-node-bridge",
    "containerDefinitions": [
        {
            "name": "sheets-envoy",
            "image": "280706339600.dkr.ecr.us-east-2.amazonaws.com/sheets-envoy:dhost",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "sheets-envoy-443-tcp",
                    "containerPort": 443,
                    "hostPort": 443,
                    "protocol": "tcp",
                    "appProtocol": "grpc"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/envoy-node-bridge",
                    "awslogs-region": "us-east-2",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        },
        {
            "name": "sheets-node",
            "image": "280706339600.dkr.ecr.us-east-2.amazonaws.com/sheets-node:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "sheets-node-9090-tcp",
                    "containerPort": 9090,
                    "hostPort": 9090,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            "essential": true,
            "environment": [],
            "environmentFiles": [],
            "mountPoints": [],
            "volumesFrom": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-create-group": "true",
                    "awslogs-group": "/ecs/envoy-node-bridge",
                    "awslogs-region": "us-east-2",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "executionRoleArn": "arn:aws:iam::280706339600:role/ecsTaskExecutionRole",
    "networkMode": "bridge",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "512",
    "memory": "1024",
    "runtimePlatform": {
        "cpuArchitecture": "X86_64",
        "operatingSystemFamily": "LINUX"
    }
}

Setting up HTTPS

Depending on what frontend you’re using, you may need to set up HTTPS on the EC2 instance. Since I’m still not at the stage of a production deployment, I decided to skip this again by hosting a Next.js website on S3 (without HTTPS). For example, Vercel and Netlify weren’t working for me, because they are too secure, and all their websites have HTTPS, which means they’re not allowed to use HTTP to contact an endpoint.

Conclusion

Documentation in the open source world is not great.

Comments