A full Ethereum node is often necessary for development purposes or if you don’t want to rely on 3rd parties like Infura for blockchain access. Compared to the “Ethereum killers”, running a full ETH node is relatively affordable and requires only a basic dev ops skillset. In this blog post, I’ll describe a step-by-step process to setup a full Geth node on AWS EC2. We’ll discuss hardware costs and configure HTTPS NGNIX proxy for connecting Metamask wallet to your proprietary node. This tutorial covers Geth version 1.10.26, so it applies to ETH 2.0 protocol version after the Merge. I won’t delve into details but focus on the necessary minimum to get your node up and running quickly.
Only the first part is specific to AWS. The rest of the steps will be identical on any other Cloud VPS provider or proprietary VPS server running Ubuntu.
Do I need 32 ETH to run full node after the Merge?
Let’s start by addressing this common misconception. After the merge locking, 32 units of Ether is necessary to run a full block producer node. It means that by staking Ether, you help increase the Ethereum network security by taking part in a new proof of stake consensus mechanism. Your full staking node would be randomly chosen to append a new block to the blockchain and receive corresponding rewards or optional MEV tips.
Configuring a full non-staking node is similar to the block producer node, with the critical difference that you don’t need to own any Ether to run it. While non-staking full nodes don’t produce new blocks, they still help increase the network’s security by validating the correctness of received blocks. You can also use your proprietary full node to interact with the blockchain in a fully permissionless and uncensorable way. Centralized blockchain gateways like Infura or Alchemy currently disallow interactions with certain blacklisted smart contracts. Running your node is also necessary for more advanced blockchain use cases like running MEV arbitrage bots.
We have a lot of ground to cover, so let’s get started!
Spinning up an EC2 instance
Start with provisioning a new EC2 instance. Go to EC2 > Instances > Launch instances. Select Ubuntu Server 20.04 LTS (HVM), SSD Volume Type AMI, architecture 64-bit (x86). For Instance type, choose the m5.xlarge (16 GiB RAM, 4 vCPUs). It will cost ∼$150/month.
Next select, Create a new key pair, and give it any meaningful name. Press Download Key Pair to save it on your local disk as an
RSA type and
In Network settings, you have to allow inbound SSH traffic.
In Configure storage section, I chose 1250GB of gp3 volume type for root volume. The monthly cost for this amount of SSD storage will be ~$100. At the time of the last update of this post (Dec 2022), Ethereum full node needs ~1050GB of disk space. Check the current space requirements before choosing a disk size. Depending on how long you want to keep the node running, you must leave some threshold for the new blocks. The current growth rate for full nodes is at ~50GB/month.
Once the storage is configured you can now click the Launch instance button.
While the instance is initializing, you have to add more security groups to enable the remaining protocols. You have to allow inbound traffic for
30303 because it’s needed for P2P discovery and synchronization. Port
30303 should also be exposed for the
0.0.0.0/0 wildcard address. Additionally, if you want to configure external access to the node JSON-RPC API, you’ll have to open
Now back in your terminal, change permissions for your key pair by running:
Next, go to EC2 > Instances and your new server details page. Copy its Public IPv4 address. Back in your terminal, you can now SSH into your EC2:
Running full post-merge Ethereum node on Ubuntu
Before the Merge, only a single Geth execution client process was necessary to run a full node. Because of the introduction of a new proof of stake instead of a proof of work consensus mechanism, now an additional consensus client process is required. So-called client diversity is critical for the long term security of the network. In this tutorial, I describe how to use a Lighthouse client, which is currently the second most popular after the Prysm client. You can check current consensus client diversity on this website. If Lighthouse becomes too popular, consider using one of the less widespread implementations.
Installing Geth execution client on Ubuntu
Let’s start with the Geth process as a systemd service to run it in the background and enable automatic restarts. Start by running these commands to install Geth from the official repository:
/lib/systemd/system/geth.service file with the following contents:
Now you can enable and start the Geth service by running:
and see the log output using:
You should see the following log output complaining about missing consensus client process:
Installing Lighthouse consensus client on Ubuntu
Let’s fix it by installing a Lighthouse consensus client written in Rust.
v3.3.0with the newest version from the releases page!
/lib/systemd/system/lighthouse.service file with the following contents:
Now enable and start the process:
Blockchain synchronization should start now. By querying Lighthouse logs you should see similar contents:
Optionally you can omit a
--checkpoint-sync-url https://mainnet.checkpoint.sigp.io which would cause your consensus client to sync all the blocks from genesis instead of a community-provided checkpoint. But, during my tests, it took over a week instead of 2 days for the checkpoint sync.
ERROR log entries are expected during the sync process.
As long as you can see regular
INFO Syncing logs, the sync is progressing.
Once consensus client finish the sync, Geth will have to process downloaded blocks before it becomes operational. This state will display similar log enteries:
Interacting with the Geth execution client
You can verify that the node is up and running by launching a Geth console:
Inside the console, now run:
You should get a similar output indicating that the node has started the synchronization:
If you’re getting
false, you should wait for a minute or two for synchronization to kick off (or more if Lighthouse did not finish the initial sync yet). In case you have any issues with completing the synchronization, you can run:
to tail the log output. Optionally, you can run the geth process with
--verbosity 5 flag to increase logs granularity.
A few hours after the node has finished synchronization, it should be discoverable on ethernodes.org. You can double-check that you’ve correctly opened all the necessary ports by going to the
geth attach console and running:
You should see both
false values meaning that your node is discoverable in the P2P network. If you’re seeing only false, you probably did not publicly expose the
The initial synchronization time depends on the hardware configuration (more details later). You can check if our node is fully synchronized by going to the geth console and running:
and compare the value with an external data source, e.g., Etherscan. You can check out official Geth docs for more info on available API methods.
If you’re getting
0 then check your logs for similar entries:
Their presence means that your node got out of sync and might need a few hours to catch up. If the issue does not fix itself after 10+ hours, your server probably lacks CPU, memory, or disk throughput.
Running light Ethereum node after the Merge
Light node, compared to the full node used to be well a lightweight way to run a blockchain gateway. Before the Merge, it was possible to spin up a light Geth node on a free tier AWS
t2.micro instance. One caveat was that light nodes were dependent on full nodes to provide them with up-to-date block data voluntarily. Currently, after the Merge, light nodes are no longer supported, but there’s a work in progress to reenable them in the future.
Password protected HTTPS access to full Geth node with NGINX
Each console method has its JSON-RPC equivalent. You can check the current block number with HTTP API by running the following cURL command:
But right now, you can only talk to the node from inside the EC2 instance. Let’s see how we can safely expose the API to public by adding by proxing JSON-RPC traffic with NGINX.
You’ll need a domain to implement this solution. It can be a root domain or a subdomain. You have to add an
A DNS record pointing to the IP of your EC2 instance. It is recommended to use an Elastic IP address so that the address would not change if you have to change the instance configuration.
Next, inside the instance, you have to install the necessary packages:
You can now generate an SSL certificate and initial NGINX configuration by running:
To automatically renew your certificate add this line to /etc/crontab file:
Once you complete these steps, you should see an NGINX welcome screen on your domain:
Next generate a HTTP basic authentication user and password:
Now you need to edit the NGINX configuration file
We use a
proxy_pass directive to proxy traffic from an encrypted
443 HTTPS port to Geth node port
8545 on our EC2 instance without exposing it publicly. Additionally, HTTP basic authentication headers are required for every request.
Now verify that the config is correct:
and restart the NGINX process to apply changes:
The default welcome page should no longer be accessible. You can check if your full node is available via a secure HTTPS connection using this command executed from outside of your EC2:
Once you have it working, you can now connect your browser Metamask extension to use your personal full node for blockchain access. To do it go to Metamask Settings > Networks > Add a network. Give your network any name, and in the New RPC URL, input your full node connection URL in the following format:
ETH for Currency Symbol. Chain ID should be auto-filled to
1, representing the Ethereum Mainnet. You can now click Save and use your Metamask wallet as you would normally. You’re now talking directly to the Ethereum blockchain without a trusted 3rd party like Infura or Alchemy. And if AWS is still too centralized for your blockchain needs, remember that you can use a similar setup on your proprietary hardware.
Full node hardware requirements
Below you can see graphs showing CPU, and memory utilization of the
m5.xlarge (16 GiB RAM, 4 vCPUs) EC2 instance after a full synchronization completed:
16GB of RAM is regularly used up to 70%, so using an 8GB instance will likely cause memory issues. During the synchronization, memory usage was lower at ~50%, but CPU was regularly spiking up to 80%. If you want to speed up the initial synchronization then using CPU-optimized instances like
c5.2xlarge could be beneficial.
The choice of hardware depends on how urgently you need the full node up and running. Make sure to avoid using
t2/t3 instances. They feature a so-called “burstable” CPU, meaning consistent processor usage above the baseline (between 5% and 40% depending on instance size) would be throttled or incur additional charges.
After the synchronization is finished, Node hardware requirements will be different depending on your use case. If you’re running an arbitrage bot scanning the mempool or thousands of AMM contracts on each block, you’ll need a beefier server than if you occasionally submit a few transactions.
The best way to determine the most cost-effective instance type is to continuously observe the metrics to see if you’re not running out of CPU, memory, or disk IOPS. AWS Cloudwatch makes it easy to configure email alerts when metrics exceed predefined thresholds. Check out these docs for info on how to collect disk and memory usage data because they are not enabled by default.
Infura and Alchemy are currently industry standard for everyday blockchain interactions. But, knowing that I’ll always be able to access my funds even if the centralized gatekeepers are out of business vastly increase my trust in the Ethereum network. Furthermore, you’ll always be able to host full nodes on similar hardware. Storage space is only about the get cheaper. So the constantly growing size of the blockchain should never be a blocker for regular users to host full nodes and support the network.