How to Optimize MEV Arbitrage Smart Contract with Yul and Huff

 
MEV arbitrage gas savings are represented by coins Photo by Marcel Strauß on Unsplash

Minimizing gas usage directly impacts the profitability of your MEV bot. In this blog post, we will start with a straightforward but nonoptimal approach for swapping two UniswapV2 pairs and gradually improve it. We will start with Solidity-level fixes and later descend into the lower layers of the Ethereum EVM with Yul and Huff assembly languages.

Disclaimer: The information provided in this blog post is for educational purposes only and should not be treated as financial advice.

What to expect from this post?

We will not build a competitive MEV bot here. Instead, we will discuss various EVM gas optimization techniques and measure their impact. This knowledge should be applicable to any EVM (Ethereum Virtual Machine) project, not only to MEV bots. All the code examples are available in this repo.

We will be working with the classic BundleExecutor smart contract from the simple-arbitrage repo by Flashbots. I chose this example because this repo is probably a starting point for many aspiring MEV searchers, and the current implementation is nonoptimal.

Let’s see the initial version of the uniswapWeth method:

contracts/BundleExecutor.sol

  function uniswapWeth(
    uint256 _wethAmountToFirstMarket, uint256 _ethAmountToCoinbase,
    address[] memory _targets, bytes[] memory _payloads) external onlyExecutor payable {
        require (_targets.length == _payloads.length);
        uint256 _wethBalanceBefore = WETH.balanceOf(address(this));
        WETH.transfer(_targets[0], _wethAmountToFirstMarket);
        for (uint256 i = 0; i < _targets.length; i++) {
            (bool _success, bytes memory _response) = _targets[i].call(_payloads[i]);
            require(_success); _response;
        }

        uint256 _wethBalanceAfter = WETH.balanceOf(address(this));
        require(_wethBalanceAfter > _wethBalanceBefore + _ethAmountToCoinbase);
        if (_ethAmountToCoinbase == 0) return;

        uint256 _ethBalance = address(this).balance;
        if (_ethBalance < _ethAmountToCoinbase) {
            WETH.withdraw(_ethAmountToCoinbase - _ethBalance);
        }
        block.coinbase.transfer(_ethAmountToCoinbase);
    }

This method starts by sending WETH to the first pair. Later, it executes transaction payload on each target (i.e. swap method for UniswapV2 pair arbitrage). After the swaps are completed, it verifies profitability and sends the _ethAmountToCoinbase bribe to coinbase, i.e., the validator, who will construct a new block. If you don’t understand how this method works, check out this in-depth explanation of simple-arbitrage repo.

Creating MEV arbitrage opportunity

We’ll start by measuring the current gas cost and profitability of the initial contract version. Since we won’t have arbitrage lying around, we have to do a more complex setup. In theory, we could work on any opportunity that manifested in the past block. But this would mean that examples can only be executed on archive nodes (standard full nodes prune state data for nonrecent blocks).

To work around this limitation we will fork the Ethereum mainnet using Anvil. Then we will use its anvil_setStorageAt method to mock relevant storage slots and artifically create an arbitrage opportunity. Using this approach will save us some work for manually populating bytecodes and rest of the data if we instead started our blockchain from scratch.

Problem with analyzing EVM gas usage with Forge

I’ve started working on this post using Forge to measure gas usage. Forge offers an awesome overview of gas usage details and call traces:

Foundry gas traces output

Foundry shows detailed transaction execution traces


Unfortunately, I’ve noticed that the presented results are not accurate. One reason is that EVM consumes less gas for subsequent access to the same storage slot. It’s called “cold” vs. “warm” slot. For example, according to evm.codes, executing sload opcode for cold storage slot costs 2100 gas units, but for warm, only 100. So, these differences are not insignificant and can quickly add up.

Forge executes all the simulation steps inside a single transaction. It means that once we reach a method that we want to benchmark, some slots have already been “warmed up”. So, the gas usage results displayed by Foundry are usually underestimated.

Another area where I could not measure consistent gas usage improvements was fixes related to calldata size and format. Since Forge executes the measured method as an internal subtransaction, a standard external calldata opcode (i.e. calldataload) is not used.

Maybe I’m missing something. If you know a way to accurately measure gas usage with Foundry, please let me know in the comments. In theory, we could run methods manually without embedding them in the test. But this would mean that I cannot use cheatcodes to prepare the mocked data (they themselves “warm up” storage slots).

Because of the above reasons I’ve switched to doing the setup directly in Rust with ethers-rs and sending transactions to the Anvil fork. Based on my benchmarks this is a more accurate method of measuring transaction gas cost.

Transactions sent this way were usually ~3% more expensive than ones triggered via Forge tests. This makes sense because of the cold/warm storage slots issue. Please also remember that the eth_estimateGas RPC method often has an error as high as ~7%.

Measuring gas usage using ethers-rs and Anvil

Let’s now implement our first simulation. All the subsequent code examples are available in this repo. You can execute them without additional setup because they use a publicly available Infura endpoint.

To keep the post and examples concise, we will skip calculating the optimal amount for an arbitrage swap. You can check out the Flashbot’s repo for this formula. Instead, we will keep our calculations simple, swapping 0.1 WETH for DAI and later back to WETH.

Let’s start by implementing UniswapV2 calculations:

src/source/utils.rs

pub fn get_amount_out(amount_in: U256, reserve_in: U256, reserve_out: U256) -> U256 {
    let amount_in_with_fee = amount_in * U256::from(997_u64); // uniswap fee 0.3%
    let numerator = amount_in_with_fee * reserve_out;
    let denominator = reserve_in * U256::from(1000_u64) + amount_in_with_fee;
    numerator / denominator
}

This is a standard formula determining how much of a target ERC20 we can exchange in a UniswapV2 pool based on its current reserves.

Here’s an excerpt from the method for mocking Uniswap reserves and ERC20 balances to create an arbitrage:

src/source/anvil_utils.rs

pub async fn prepare_data(anvil: &AnvilInstance, executor_addr: H160) -> Result<()> {
    mock_storage_slot(
        anvil,
        Addresses::SushiPair.addr(),
        "0x8", // getReserves slot
        "0x665c6fcf00000000004d4a487a40a07d962e0000000453229E2B8F0ABE706380",
    )
    .await?;
    let executor_addr = format!("{:?}", executor_addr);
    let executor_addr = executor_addr.chars().skip(2).collect::<String>();

    let weth_executor_balance_slot = hex::decode(format!("000000000000000000000000{}0000000000000000000000000000000000000000000000000000000000000003", executor_addr)).unwrap();
    let weth_executor_balance_slot = hex::encode(keccak256(&weth_executor_balance_slot));
    mock_storage_slot(
        anvil,
        Addresses::WETH.addr(),
        &format!("0x{}", weth_executor_balance_slot),
        "0x000000000000000000000000000000000000000000000000016345785D8A0000",
    )
    .await?;
    // ...
}

async fn mock_storage_slot(
    anvil: &AnvilInstance,
    address: H160,
    slot: &str,
    value: &str,
) -> Result<()> {
    let call_data = json!({
        "jsonrpc": "2.0",
        "id": 1,
        "method": "anvil_setStorageAt",
        "params": [format!("{:?}",address), slot, value]
    });

    let json_client = Client::new();
    json_client
        .post(anvil.endpoint())
        .json(&call_data)
        .send()
        .await?
        .text()
        .await?;
    Ok(())
}

Following Solidity conventions, we generate keccak256 hashed storage stots and populate them using anvil_setStorageAt method.

Here’s how we’ll declare ethers-rs interfaces for ERC20 tokens (WETH, DAI) and pools (UniswapV2, Sushi).

src/source/actors.rs

abigen!(IERC20, "src/abi/erc20.json");
abigen!(IUniV2Pair, "src/abi/uniswap_v2_pair.json");

pub enum Addresses {
    WETH,
    DAI,
    UniPair,
    SushiPair,
}

impl Addresses {
    pub fn addr(&self) -> H160 {
        match self {
            Addresses::WETH => "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2"
                .parse()
                .unwrap(),
            Addresses::DAI => "0x6b175474e89094c44da98b954eedeac495271d0f"
                .parse()
                .unwrap(),
            Addresses::SushiPair => "0xC3D03e4F041Fd4cD388c549Ee2A29a9E5075882f"
                .parse()
                .unwrap(),
            Addresses::UniPair => "0xA478c2975Ab1Ea89e8196811F51A7B7Ade33eB11"
                .parse()
                .unwrap(),
        }
    }

    pub fn addr_str(&self) -> String {
        let addr = format!("{:?}", self.addr());
        addr.chars().skip(2).collect::<String>()
    }
}

pub fn weth(provider: Arc<Provider<Ws>>) -> IERC20<Provider<Ws>> {
    IERC20::new(Addresses::WETH.addr(), provider.clone())
}

pub fn dai(provider: Arc<Provider<Ws>>) -> IERC20<Provider<Ws>> {
    IERC20::new(Addresses::DAI.addr(), provider.clone())
}

pub fn uni_pair(provider: Arc<Provider<Ws>>) -> IUniV2Pair<Provider<Ws>> {
    IUniV2Pair::new(Addresses::UniPair.addr(), provider.clone())
}

pub fn sushi_pair(provider: Arc<Provider<Ws>>) -> IUniV2Pair<Provider<Ws>> {
    IUniV2Pair::new(Addresses::SushiPair.addr(), provider.clone())
}

Here’s a helper method for deploying a smart contract and returning its address:

src/source/anvil_utils.rs

pub async fn deploy_contract(name: &str, anvil: &AnvilInstance) -> Result<H160> {
    let output = Command::new("forge")
        .arg("create")
        .arg(format!("contracts/{}", name))
        .arg("--rpc-url")
        .arg(anvil.endpoint())
        .arg("--private-key")
        .arg(std::env::var("PRIVATE_KEY").expect("PRIVATE_KEY not set!"))
        .arg("--json")
        .arg("--constructor-args")
        .arg(std::env::var("ACCOUNT").expect("ACCOUNT not set!"))
        .output()?;
    let stdout = String::from_utf8_lossy(&output.stdout);
    let deployment_json: Value = serde_json::from_str(&stdout).unwrap();

    let executor_address: H160 = deployment_json["deployedTo"]
        .as_str()
        .unwrap()
        .parse()
        .unwrap();
    println!("Deployed {}: {:?}", name, &executor_address);
    Ok(executor_address)
}

Now let’s implement our first simulation:

src/bundle_executor_before.rs

abigen!(IBundleExecutor, "src/abi/bundle-executor-before.json");

#[tokio::main]
async fn main() -> Result<()> {
    env_logger::init();
    let rpc_url: Url = std::env::var("ETH_RPC_URL").unwrap().parse()?;
    let anvil = Anvil::new().fork(rpc_url).block_time(1_u64).spawn();
    let anvil_provider = Ws::connect(anvil.ws_endpoint()).await?;
    let anvil_provider = Arc::new(Provider::new(anvil_provider));

    let executor_address =
        deploy_contract("BundleExecutor-before.sol:FlashBotsMultiCall", &anvil).await?;

    let executor = IBundleExecutor::new(executor_address, anvil_provider.clone());
    let uni_pair = uni_pair(anvil_provider.clone());
    let sushi_pair = sushi_pair(anvil_provider.clone());

    prepare_data(&anvil, executor.address()).await?;

    let weth_amount_in = ETHER.div(10);
    let (uni_dai_reserve, uni_weth_reserve, _): (u128, u128, _) =
        uni_pair.get_reserves().call().await?;
    let (sushi_dai_reserve, sushi_weth_reserve, _): (u128, u128, _) =
        sushi_pair.get_reserves().call().await?;

    let dai_amount_out = get_amount_out(
        weth_amount_in,
        U256::from(uni_weth_reserve),
        U256::from(uni_dai_reserve),
    );

    let weth_amount_out = get_amount_out(
        dai_amount_out,
        U256::from(sushi_dai_reserve),
        U256::from(sushi_weth_reserve),
    );

    let swap1 = uni_pair
        .swap(
            dai_amount_out,
            U256::zero(),
            sushi_pair.address(),
            Bytes::new(),
        )
        .tx;
    let swap1 = swap1.data().unwrap().clone();

    let swap2 = sushi_pair
        .swap(
            U256::zero(),
            weth_amount_out,
            executor_address,
            Bytes::new(),
        )
        .tx;
    let swap2 = swap2.data().unwrap().clone();

    let client = SignerMiddleware::new(
        anvil_provider.clone(),
        env::var("PRIVATE_KEY")
            .expect("PRIVATE_KEY must be set")
            .parse::<LocalWallet>()
            .unwrap()
            .clone()
            .with_chain_id(1_u64),
    );

    let swap_tx = executor
        .uniswap_weth(
            weth_amount_in,
            ETHER.div(1000000),
            vec![Addresses::UniPair.addr(), Addresses::SushiPair.addr()],
            vec![swap1, swap2],
        )
        .tx;

    let receipt = client.send_transaction(swap_tx, None).await?;
    let receipt = receipt.await?;
        let receipt = receipt.await?;

    let weth_balance_after = weth(anvil_provider.clone())
        .balance_of(executor_address)
        .call()
        .await?;

    let gas_price_gwei = 15;
    let gas_price_wei = GWEI.mul(gas_price_gwei);

    println!("ETH price: ${}", ETHER_PRICE_USD);
    println!("WETH amount in: {}", weth_amount_in);
    println!("WETH amount out: {}", weth_balance_after);
    let full_profit = weth_balance_after - weth_amount_in;
    println!("Total WETH profit: {}", full_profit);
    println!("Total USD profit: ${:.2}", eth_to_usd(full_profit));
    println!("Gas price GWEI: {}", gas_price_gwei);
    let gas_used = receipt.unwrap().cumulative_gas_used;

    println!("Gas used: {}", gas_used);
    let gas_cost = gas_used * gas_price_wei;
    println!("Gas cost: {}", gas_cost);
    println!("Gas cost USD: ${:.2}", eth_to_usd(gas_cost));
    let real_profit = full_profit - gas_cost;
    println!("Real WETH profit: {}", real_profit);
    println!("Real USD profit: ${:.2}", eth_to_usd(real_profit));

    Ok(())
}

We start by forking the Mainnet with Anvil and instantiating anvil_provider. As previously discussed, prepare_data creates an artificial arbitrage opportunity. Later, we construct the transaction payload for WETH -> DAI and DAI -> WETH swaps based on amounts calculated with the get_amount_out method.

Then we execute the uniswapWeth method, with a coinbase bribe to increase our chances of inclusion in the next block (bribe is minimal to simplify calculations for other examples). You can run the simulation like this:

src/executor_before.rs

cargo run --bin executor_before

You should see the following output:

ETH price: $3800
WETH amount in: 100000000000000000
WETH amount out: 102769314499031232
Total WETH profit: 2769314499031232
Total USD profit: $10.52
Gas price GWEI: 15
Gas used: 156749
Gas cost: 2351235000000000
Gas cost USD: $8.93
Real WETH profit: 418079499031232
Real USD profit: $1.59

The current implementation of our uniswapWeth method consumes 156749 gas. For the rest of this blog post, we will assume the ETH price at $3800 and the gas price at 15 GWEI. Under these conditions, we’ve grossed over $10 in total profit, scoring us $1.59 after subtracting gas costs. In practice, such profit margins are impossible for popular arbitrage opportunities on the Mainnet. But for the sake of this tutorial, let’s imagine a perfect world where other MEV bots don’t exist.

We will now proceed to reduce the gas usage of our contract and see how it can affect our profit margins.

Solidity gas cost opimizations

Let’s start tweaking the gas usage. I will provide a complete Solidity contract implementation by the end of this section.

calldata vs memory

This is the current method signature of uniswapWeth:

    function uniswapWeth(
        uint256 _wethAmountToFirstMarket,
        uint256 _ethAmountToCoinbase,
        address[] memory _targets,
        bytes[] memory _payloads
    ) external payable onlyExecutor {

memory prefix is necessary for array arguments. We can reduce gas usage by rewriting the method signature like this:

    function uniswapWeth(
        uint256 _wethAmountToFirstMarket,
        uint256 _ethAmountToCoinbase,
        address[] calldata _targets,
        bytes[] calldata _payloads
    ) external payable onlyExecutor {

By using calldata instead of memory we can reduce gas usage 156749 -> 155067 i.e. ~1.1% improvement. Our array arguments are now read directly from method execution calldata without writing them to the EVM memory. This change improves our profits $1.59 -> $1.68.

Avoid block.coinbase transfers

Our current method of bribing the validator uses a direct transfer to the block.coinbase:

src/bundle_executor-before... add LINK!

require(_wethBalanceAfter > _wethBalanceBefore + _ethAmountToCoinbase);
 if (_ethAmountToCoinbase == 0) return;

uint256 _ethBalance = address(this).balance;
if (_ethBalance < _ethAmountToCoinbase) {
    WETH.withdraw(_ethAmountToCoinbase - _ethBalance);
}
block.coinbase.transfer(_ethAmountToCoinbase);

Each ETH transfer cost 21.000 gas units. There’s also additional overhead of calling WETH.withdraw in case our contract is missing ETH. We can avoid it doing a manual funding, but it means additional maintenance costs. Instead of using the coinbase transfer, a more efficient way is to bribe the validator by setting a higher gas cost for our transaction. In that case, we have to pass estimated gas usage as an argument and include it in the final profit validation. You can get gas usage estimates off-chain by using the ETH RPC eth_estimateGas method. Our validation will now look like this:

require(_wethBalanceAfter > _wethBalanceBefore + _gasCost);

Implementing this logic changes gas usage 155067 -> 138492 i.e. ~10% reduction! As a result, our profits increased $1.68 -> $2.63.

Avoid on-chain calls

The current version of the contract checks the amount of WETH before and after executing the arbitrage swap to ensure that we’ve made a profit. We can reduce one call to WETH balanceOf method by passing our current WETH balance as an argument. Our arbitrage transactions are atomic and we can easily check the balance off-chain. As a result of reducing one external contract call our gas usage will change 138492 -> 137745, i.e., further ~0.5% improvement, and increase our profit by next $0.04.

Disable safe math

Since Solidity 0.8.0 math operations no longer implicitly over/underflow. Instead, they trigger a revert. It helps avoid bugs and security issues but is more gas-expensive. For our MEV bot, we can turn off these checks to slightly reduce gas usage.

unchecked {
    require(_wethBalanceAfter > _wethBalanceBefore + _gasCost);
}

Adding unchecked keyword slightly reduces the gas usage 137745 -> 137679 ~0.05%. A reduction would likely be more noticeable if your bot did more on-chain calculations.

Increase compilation optimizer steps

A simple way to reduce gas usage in your smart contracts is to leverage the built-in optimizer. It’s enabled by default but configured to the value of 200. Appending --optimizer-runs 1000000 to our contract deployment command reduces gas usage from 137679 to 137568, i.e., ~0.08% improvement. It also increases the gas cost of contract deployment by ~15%. However, a one-time larger cost is usually acceptable in the case of the MEV bot, which is expected to execute thousands of txs. Whenever you deploy a contract, always double-check if optimization is enabled. If we disable it, the deployment cost is ~30% larger compared to the default value, and method execution is ~2.5% more expensive

Use vanity smart contract address

EVM charges 16 gas for every non-zero byte in transaction calldata and only 4 gas for each byte set to zero. This means that by maximizing the number of zeroes, we can reduce gas usage even further. The address of our smart contract is included in our calldata and also the calldata of the internal swap call. This means that for each zero that we add to our address, we can shave off 12 gas units. You can use VanityEth to treat yourself with a fancy EOA full of zeroes. Alternatively, you can generate an EOA whose next deployed smart contract will have a determined prefix with the following command:

vanityeth -i 00000 --contract

Number of zeros you can generate depends on your processing power and patience. To deploy a smart contract under a selected vanity address, you have to implement the following helper method:

src/source/anvil_utils.rs

pub async fn copy_bytecode_from(
    from: H160,
    to: H160,
    provider: Arc<Provider<Ws>>,
    anvil: &AnvilInstance,
) -> Result<()> {
    let bytecode = provider.get_code(from, None).await?;
    let call_data = json!({
        "jsonrpc": "2.0",
        "id": 1,
        "method": "anvil_setCode",
        "params": [format!("{:?}",to), bytecode]
    });

    let json_client = Client::new();
    json_client
        .post(anvil.endpoint())
        .json(&call_data)
        .send()
        .await?;

    Ok(())
}

and use it in simulation like this:

    let executor_template_address =
        deploy_contract("BundleExecutor-after.sol:FlashBotsMultiCall", &anvil).await?;

    let executor_address: H160 = "0x000000000000000012D4Bc56F957B3710216aB12"
        .parse()
        .unwrap();

    copy_bytecode_from(
        executor_template_address,
        executor_address,
        anvil_provider.clone(),
        &anvil,
    )
    .await?;

    let executor = IBundleExecutor::new(executor_address, anvil_provider.clone());

We first deploy a smart contract to a regular address and later copy its bytecode to any vanity address we want. This technique is also helpful for mocking bytecodes of already deployed contracts.

This fix changes gas usage 137568 -> 137472, i.e., 96 gas units of difference. You can see that our calculations play out perfectly. Our sample vanity address starts with 16 zeros, i.e., we’ve added 8 zero bytes. Each zero byte is 16 - 4 = 12 units cheaper. 8 times 12, that’s 96 quick maths.

BTW this is another example of gas optimization I could not reliably measure using Foundry. In a test simulation uniswapWeth was an internal call so it did not use an external calldata.

Summary of Solidity-level optimizations

Here’s the uniswapWeth method with all the optimizations applied:

src/BundleExecutor-after.sol

    function uniswapWeth(
        uint256 _wethAmountToFirstMarket,
        uint256 _gasCost,
        uint256 _wethBalanceBefore,
        address[] calldata _targets,
        bytes[] calldata _payloads
    ) external payable onlyExecutor {
        require(_targets.length == _payloads.length);
        WETH.transfer(_targets[0], _wethAmountToFirstMarket);
        for (uint256 i = 0; i < _targets.length; i++) {
            (bool _success, bytes memory _response) = _targets[i].call(
                _payloads[i]
            );
            require(_success);
            _response;
        }

        uint256 _wethBalanceAfter = WETH.balanceOf(address(this));

        unchecked {
            require(_wethBalanceAfter > _wethBalanceBefore + _gasCost);
        }
    }

I won’t be pasting full codebases of simulations because there are few differences. You can check out an updated simulation version here. Running it should display the following output:

src/executor_after.rs

cargo run --bin executor_after
ETH price: $3800
WETH amount in: 100000000000000000
WETH amount out: 102769314499031232
Total WETH profit: 2769314499031232
Total USD profit: $10.52
Gas price GWEI: 15
Gas used: 137472
Gas cost: 2062080000000000
Gas cost USD: $7.84
Real WETH profit: 707234499031232
Real USD profit: $2.69

Compared to the initial version, our gas usage was reduced by ~12% from 156749 to 137472. It means that for gas price of 15 gwei our profit margin for this sample arbitrage improved by ~$1.10! Let’s see how we can take this further by going lower level with Yul assembly.

Yul gas cost opimizations

As we’ve discussed before, each non-zero byte of a calldata increases gas cost by 16 units and zero byte by 4 units. Let’s analyze the current format of our method calldata:

let swap_tx = executor
    .uniswap_weth(
        weth_amount_in,
        U256::zero(),
        weth_amount_in,
        vec![Addresses::UniPair.addr(), Addresses::SushiPair.addr()],
        vec![swap1, swap2],
    )
    .tx;

let swap_tx = swap_tx.data().unwrap().clone();
dbg!(&swap_tx);

It should produce the following output:

0x0bd1512c
000000000000000000000000000000000000000000000000016345785d8a0000
0000000000000000000000000000000000000000000000000000763bfbd22000
000000000000000000000000000000000000000000000000016345785d8a0000
00000000000000000000000000000000000000000000000000000000000000a0
0000000000000000000000000000000000000000000000000000000000000100
0000000000000000000000000000000000000000000000000000000000000002
000000000000000000000000a478c2975ab1ea89e8196811f51a7b7ade33eb11
000000000000000000000000c3d03e4f041fd4cd388c549ee2a29a9e5075882f
0000000000000000000000000000000000000000000000000000000000000002
0000000000000000000000000000000000000000000000000000000000000040
0000000000000000000000000000000000000000000000000000000000000120
00000000000000000000000000000000000000000000000000000000000000a4
022c0d9f0000000000000000000000000000000000000000000000147e1890d0
b72b1f5500000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000c3d03e4f041fd4cd388c549ee2a29a9e
5075882f00000000000000000000000000000000000000000000000000000000
0000008000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000a4
022c0d9f00000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000016d1c25
a48670c0000000000000000000000000000000000000000002d4bc56f957b371
0216ab0000000000000000000000000000000000000000000000000000000000
0000008000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000

That’s a lot of zeros. Solidity uses an encoding convention where each variable takes up 32 bytes (padded with zeroes), regardless of its size. Also encoding dynamic arrays adds some overhead because we have to encode data about their size. It means that we have to pay the gas cost for a lot of empty calldata. You can check out these docs to learn more about Solidity calldata ABI encoding conventions.

Here’s how you can pack the same data more efficiently.

use alloy_sol_types::SolValue;

let packed = (
    weth_amount_in.as_u64(),
    weth_amount_in.as_u64(),
    2062080000000000_u64, // gasCost
    to_addr(Addresses::UniPair.addr()),
    to_addr(Addresses::SushiPair.addr()),
    swap1.to_vec(),
    swap2.to_vec(),
)
.abi_encode_packed();

let packed = Bytes::from(packed);
dbg!(&packed);

it should produce:

016345785d8a0000016345785d8a00000007537369e60000a478c2975ab1ea89
e8196811f51a7b7ade33eb11c3d03e4f041fd4cd388c549ee2a29a9e5075882f
022c0d9f0000000000000000000000000000000000000000000000147e1890d0
b72b1f5500000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000c3d03e4f041fd4cd388c549ee2a29a9e
5075882f00000000000000000000000000000000000000000000000000000000
0000008000000000000000000000000000000000000000000000000000000000
00000000022c0d9f000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
016d1c25a48670c0000000000000000000000000000000000000000012d4bc56
f957b3710216ab12000000000000000000000000000000000000000000000000
0000000000000080000000000000000000000000000000000000000000000000
0000000000000000

In this example, I’ve used alloy-rs a successor to ethers-rs. I could not find an ethers-rs API to encode Int values to custom lengths. It always tries to minimize calldata size, providing inconsistent results depending on inputs. In this case I want to use uint64 encoding for ETH input values and uint128 for swapped ERC20 tokens amounts. Please be aware that alloy is still in alpha development phase and its API is likely to change.

By packing the same logical data (without method signature) into 392 bytes instead of 772, we’ve achieved ~50% reduction! Since we pay 4 gas units for each zero byte from calldata we can expect gas reduction of ~1500 units.

But this approach is still far from optimal. We’re passing calldata bytes for the swap method in the original format. A more efficient way is to construct this calldata on-chain based on params passed to the method. More concise calldata containing all the info needed to execute our arbitrage can be generated like that:

let packed = (
    weth_amount_in.as_u64(),
    weth_amount_in.as_u64(),
    2062080000000000_u64, // gasCost
    to_addr(Addresses::UniPair.addr()),
    to_addr(Addresses::SushiPair.addr()),
    dai_amount_out.as_u128(),
    0_u16,
    weth_amount_out.as_u128(),
    1_u16,
)
    .abi_encode_packed();

let packed = Bytes::from(packed);
dbg!(packed.clone());

This example produces:

016345785d8a0000016345785d8a00000007537369e60000a478c2975ab1ea89
e8196811f51a7b7ade33eb11c3d03e4f041fd4cd388c549ee2a29a9e5075882f
00000000000000147e1890d0b72b1f5500000000000000000000016d1c25a486
70c00001

We’re now encoding 100 bytes compared to the 772 that we’ve started from, i.e., ~87% reduction! It should translate to over 2500 gas units less on each arbitrage. BTW please remember that this calldata format is still far from optimal. For example jaredfromthesubway uses 27 bytes of calldata for UniswapV2 swaps.

This approach means that we’re leaving Solidity conventions behind. Here’s how you have to reimplement the smart contract with Yul to use the packed calldata format:

src/BundleExecutor-yulsol.sol

fallback() external payable onlyExecutor {
    assembly {
        // transfer weth to first market
        mstore(
            0x0,
            0xa9059cbb00000000000000000000000000000000000000000000000000000000
        )
        mstore(0x04, shr(0x60, calldataload(0x18))) // _target1
        mstore(0x24, shr(0xc0, calldataload(0x0))) // _wethAmountToFirstMarket
        pop(
            call(
                gas(),
                0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2,
                0x0,
                0x0,
                0x44,
                0x0,
                0x20
            )
        )

        // swap1
        mstore(
            0x0,
            0x022c0d9f00000000000000000000000000000000000000000000000000000000
        )
        let amoutOut1Index := shr(0xF0, calldataload(0x50)) // arg7
        let amountOut1 := shr(0x80, calldataload(0x40)) // arg6

        switch amoutOut1Index
        case 0x0 {
            mstore(0x04, amountOut1)
            mstore(0x24, 0x0)
        }
        default {
            mstore(0x04, 0x0)
            mstore(0x24, amountOut1)
        }

        mstore(0x44, shr(0x60, calldataload(0x2c))) // arg5 target2
        mstore(
            0x64,
            0x0000000000000000000000000000000000000000000000000000000000000080
        ) // empty bytes
        pop(
            call(
                gas(),
                shr(0x60, calldataload(0x18)),
                0x0,
                0x0,
                0xa4,
                0x0,
                0x0
            )
        )

        // swap2
        // skip storing swap sig it's still there
        let amoutOut2Index := shr(0xF0, calldataload(0x62)) // arg9
        let amountOut2 := shr(0x80, calldataload(0x52)) // arg8

        switch amoutOut2Index
        case 0x0 {
            mstore(0x04, amountOut2)
            mstore(0x24, 0x0)
        }
        default {
            mstore(0x04, 0x0)
            mstore(0x24, amountOut2)
        }

        mstore(0x44, address())
        // skip storing empty _data, it's still there
        pop(
            call(
                gas(),
                shr(0x60, calldataload(0x2c)),
                0x0,
                0x0,
                0xa4,
                0x0,
                0x0
            )
        )

        // check profit
        mstore(
            0x0,
            0x70a0823100000000000000000000000000000000000000000000000000000000
        ) // balanceOf Sig
        mstore(0x04, address())
        pop(
            call(
                gas(),
                0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2,
                0x0,
                0x0,
                0x24,
                0x0,
                0x20
            )
        ) // get current balance
        let balanceBefore := shr(0xc0, calldataload(0x08))
        let currentBalance := mload(0x0)
        let wethProfit := sub(currentBalance, balanceBefore)
        let gasCost := shr(0xc0, calldataload(0x10))

        if lt(wethProfit, gasCost) {
            mstore(0x0, 0x03)
            revert(0x0, 0x20)
        }

        stop()
    }
}

The above example uses Yul assembly embedded within a standard Solidity file. Explaining the details of this implementation would not fit within a scope of this post. This video tutorial contains most information you need to understand the above code sample. You can also check out the popular subway MEV bot for a similar implementation in action.

Here’s the output of simulation for this contract:

src/executor_yulsol.rs:

cargo run --bin executor_yulsol
ETH price: $3800
WETH amount in: 100000000000000000
WETH amount out: 102769314499031232
Total WETH profit: 2769314499031232
Total USD profit: $10.52
Gas price GWEI: 15
Gas used: 131470
Gas cost: 1972050000000000
Gas cost USD: $7.49
Real WETH profit: 797264499031232
Real USD profit: $3.03

Reducing the size of calldata and rewriting bot to Yul reduced gas usage by over 4% 137472 -> 131470, and increased our profit margin by over $0.40!

Let’s see how we can take this even further by leaving Solidity completly behind and using a pure Yul for our next implementation.

How to write a MEV bot in Yul?

We’ve implemented our logic in Yul, but embedding it in Solidity unnecessarily increases gas usage. One example is that Solidity spends additional gas for each method selector in our contract. Only after checking that there’s no match it executes our fallback function. Adding a single new method for our contract increases gas usage by 22 units. It might not seem like much, but advanced MEV bots are likely to implement hundreds of methods, making these approach inefficient.

Using transaction value is a more gas-efficient way to implement routing for your MEV smart contract. This means that sending 0 wei would execute logic for UniV2 -> UniV2 swaps, and sending one wei would trigger ERC20 token withdrawal, etc. Let’s see it in action:

contracts/BundleExecutor.yul

// SPDX-License-Identifier: MIT

object "Token" {
    code {
        // Deploy the contract
        datacopy(0, dataoffset("runtime"), datasize("runtime"))
        return(0, datasize("runtime"))
    }
    object "runtime" {
        code {
            let isOwner := eq(0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266, caller()) // only owner
            if iszero(isOwner) {
                revert(0x0, 0x0)
            }

            switch callvalue() 
            // _wethAmountToFirstMarket: uint64
            // _wethBalanceBefore: uint64
            // _gasCost: uint64
            // _target1: address
            // _target2: address
            // _token0_amountOut: u128
            // _token0_amountOutIndex: u16
            // _token1_amountOut: u128
            // _token1_amountOutIndex: u16
            case 0x0 { 
                // weth to first markket
                mstore(0x0, 0xa9059cbb00000000000000000000000000000000000000000000000000000000) 
                mstore(0x04, shr(0x60, calldataload(0x18))) // arg4 target1
                mstore(0x24, shr(0xc0, calldataload(0x0))) // arg1 _wethAmountToFirstMarket
                pop(call(gas(), 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2, 0x0, 0x0, 0x44, 0x0, 0x20))

                // swap1 
                mstore(0x0, 0x022c0d9f00000000000000000000000000000000000000000000000000000000)
                let amoutOut1Index := shr(0xF0, calldataload(0x50)) // arg7 
                let amountOut1 := shr(0x80, calldataload(0x40)) // arg6

                switch amoutOut1Index
                case 0x0 {
                    mstore(0x04, amountOut1)
                    mstore(0x24, 0x0)
                } default {
                    mstore(0x04, 0x0)
                    mstore(0x24, amountOut1)
                }

                mstore(0x44, shr(0x60, calldataload(0x2c))) // arg5 target2
                mstore(0x64, 0x0000000000000000000000000000000000000000000000000000000000000080) // empty bytes
                pop(call(gas(), shr(0x60, calldataload(0x18)), 0x0, 0x0, 0xa4, 0x0, 0x0))

                // swap2 
                // skip storing swap sig it's still there

                let amoutOut2Index := shr(0xF0, calldataload(0x62)) // arg9
                let amountOut2 := shr(0x80, calldataload(0x52)) // arg8

                switch amoutOut2Index
                case 0x0 {
                    mstore(0x04, amountOut2)
                    mstore(0x24, 0x0)
                } default {
                    mstore(0x04, 0x0)
                    mstore(0x24, amountOut2)
                }

                mstore(0x44, address())
                // skip storing empty _data, it's still there
                pop(call(gas(), shr(0x60, calldataload(0x2c)), 0x0, 0x0, 0xa4, 0x0, 0x0))

                // check profit
                mstore(0x0, 0x70a0823100000000000000000000000000000000000000000000000000000000) // balanceOf Sig
                mstore(0x04, address())
                pop(call(gas(), 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2, 0x0, 0x0, 0x24, 0x0, 0x20)) // get current balance
                let balanceBefore := shr(0xc0, calldataload(0x08))
                let currentBalance := mload(0x0)
                let wethProfit := sub(currentBalance, balanceBefore)
                let gasCost := shr(0xc0, calldataload(0x10))

                if lt(wethProfit, gasCost) {
                    mstore(0x0, 0x03)
                    revert(0x0, 0x20)
                }

                stop()
            } 
            case 0x1 { //withdraw ERC20
                // TODO implement
            }
            case 0x2 { //withdraw ETH
                // TODO implement
            }
            default {
                mstore(0x00, 0x194)
                revert(0x0, 0x20)
            }
        }
    }
}
ERC20/ETH withdrawal logic is left as an exercise for the reader


Apart from the standard Yul boilerplate, the codebase is almost identical to our previous example. A notable difference is switch callvalue(), which chooses the correct execution path based on the amount of wei sent in the transaction. Here’s the result of executing this version of the contract:

src/executor_yul.rs

cargo run --bin executor_yul
ETH price: $3800
WETH amount in: 100000000000000000
WETH amount out: 102769314499031232
Total WETH profit: 2769314499031232
Total USD profit: $10.52
Gas price GWEI: 15
Gas used: 131390
Gas cost: 1970850000000000
Gas cost USD: $7.49
Real WETH profit: 798464499031232
Real USD profit: $3.03

We’ve reduced gas usage by ~0.07% 131492 -> 131390.

You can generate a deployed bytecode of a pure Yul contract in Remix. In our simulation, we use this helper method to mock bytecode at our vanity address:

src/source/anvil_utils.rs

pub async fn set_bytecode(bytecode: Bytes, to: H160, anvil: &AnvilInstance) -> Result<()> {
    let call_data = json!({
        "jsonrpc": "2.0",
        "id": 1,
        "method": "anvil_setCode",
        "params": [format!("{:?}",to), bytecode]
    });

    let json_client = Client::new();
    json_client
        .post(anvil.endpoint())
        .json(&call_data)
        .send()
        .await?;

    Ok(())
}

Now get ready for the final boss of EVM gas optimizations: THE HUFF!

How to improve EVM gas usage with Huff?

Huff is not a full-blown programming language but a set of helper macros on top of pure EVM opcodes. The learning curve could be slightly brutal, especially if you’ve never used an assembly language before. But learning the basics of Huff is a awesome way to better understand the EVM.

Let’s see how our bundle executor looks implemented in Huff:

//"transfer(address,uint256)"
#define constant ERC20_TRANSFER_SIG = 0xa9059cbb00000000000000000000000000000000000000000000000000000000
//"balanceOf(address)"
#define constant ERC20_BALANCE_OF_SIG = 0x70a0823100000000000000000000000000000000000000000000000000000000
//"swap(uint256,uint256,address,bytes)"
#define constant UNI_SWAP_SIG = 0x022c0d9f00000000000000000000000000000000000000000000000000000000

#define constant EMPTY_BYTES = 0x0000000000000000000000000000000000000000000000000000000000000080

#define constant OWNER = 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92266

#define constant WETH = 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2

#define macro MAIN() = takes (0) returns (0) {
  /* Only owner */
  caller [OWNER] eq iszero accessDenied jumpi

  /* Routing */
  0x0 callvalue eq swap jumpi
  0x1 callvalue eq withdrawToken jumpi
  0x2 callvalue eq withdrawEth jumpi
  0x0 POP_REVERT()

  swap:
    SWAP()

  withdrawToken:
    WITHDRAW_TOKEN()

  withdrawEth:
    WITHDRAW_ETH()

  accessDenied:
    0x0 POP_REVERT()

  stop
}

#define macro SWAP() = takes (0) returns (0) {
  // weth to first market
  [ERC20_TRANSFER_SIG] 0x0 mstore // []
  ARG_4() 0x04 mstore // []
  ARG_1() 0x24 mstore // []
  0x20 0x0 0x44 0x0 0x0 [WETH] gas call // [success1]

  // iszero error1 jumpi
  // 0xC9 CONSOLE_LOG_INT() // 201

  // swap1
  [UNI_SWAP_SIG] 0x0 mstore // sig

  ARG_7() 0x00 eq amountOut1Index0 jumpi

  0x0 0x04 mstore // _amount0Out
  ARG_6() 0x24 mstore // _amount1Out

  doswap1 jump

  amountOut1Index0: 
  ARG_6() 0x04 mstore // _amount0Out
  0x0 0x24 mstore // _amount1Out

  doswap1:
  ARG_5() 0x44 mstore // _to
  [EMPTY_BYTES] 0x64 mstore // _data
  0x00 0x0 0xa4 0x0 0x0 ARG_4() gas call // [success2, success1]

  // swap2
  // skip storing sig it's still there
  ARG_9() 0x00 eq amountOut2Index0 jumpi

  0x0 0x04 mstore // _amount0Out
  ARG_8() 0x24 mstore // _amount1Out
  doswap2 jump

  amountOut2Index0: 
  ARG_8() 0x04 mstore // _amount0Out
  0x0 0x24 mstore // _amount1Out

  doswap2:
  address 0x44 mstore
  // skip storing empty _data, it's still there

  0x00 0x0 0xa4 0x0 0x0 ARG_5() gas call // [success3, success2, success1]

  [ERC20_BALANCE_OF_SIG] 0x0 mstore // []
  address 0x04 mstore // []
  0x20 0x0 0x24 0x0 0x0 [WETH] gas call // [success]
  ARG_2() // [balanceBefore]
  0x0 mload // [balanceBefore, currentBalance]
  sub // [wethProfit]

  ARG_3() gt noProfit jumpi // []
  stop

  noProfit:
    0x3 POP_REVERT()
}

// _wethAmountToFirstMarket
#define macro ARG_1() = takes (0) returns (0) {
  0x00 calldataload 0xc0 shr
}

// _wethBalanceBefore
#define macro ARG_2() = takes (0) returns (0) {
  0x08 calldataload 0xc0 shr
}

// _gasCost
#define macro ARG_3() = takes (0) returns (0) {
  0x10 calldataload 0xc0 shr
}

// _target1
#define macro ARG_4() = takes (0) returns (0) {
  0x18 calldataload 0x60 shr
}

// _target2
#define macro ARG_5() = takes (0) returns (0) {
  0x2c calldataload 0x60 shr
}

// amountOut1
#define macro ARG_6() = takes (0) returns (0) {
  0x40 calldataload 0x80 shr
}

// amoutOut1Index
#define macro ARG_7() = takes (0) returns (0) {
  0x50 calldataload 0xF0 shr
}

// amountOut2
#define macro ARG_8() = takes (0) returns (0) {
  0x52 calldataload 0x80 shr
}

// amoutOut2Index
#define macro ARG_9() = takes (0) returns (0) {
  0x62 calldataload 0xF0 shr
}

#define macro WITHDRAW_TOKEN() = takes (0) returns (0) {
  // TODO implement
  stop
}

#define macro WITHDRAW_ETH() = takes (0) returns (0) {
  // TODO implement
  stop
}


#define macro POP_REVERT() = takes (1) returns (0) {
  0x00 mstore
  0x20 0x00 revert
}

I won’t go into details of this implementation because it would explode the size of this already bloated post. This video from OpenZeppelin is a great introduction to Huff.

BTW here’s a useful helper function allowing to do console.log in Huff by using a special Foundry/Hardhat logging contract:

#define macro CONSOLE_LOG_INT() = takes (1) returns (0) {
    // "log(int)"
    0x4e0c1d1d00000000000000000000000000000000000000000000000000000000
    CONSOLE_LOG_BASE()
}

#define macro CONSOLE_LOG_ADDRESS() = takes (1) returns (0) {
    // "log(address)"
    0x2c2ecbc200000000000000000000000000000000000000000000000000000000
    CONSOLE_LOG_BASE()
}

#define macro CONSOLE_LOG_BASE() = takes (2) returns (0) {
  0x400 mstore
  0x404 mstore
  0x0 0x0 0x24 0x400 0x000000000000000000636F6e736F6c652e6c6f67 gas staticcall
  pop
}

You can generate Huff contract bytecode by running this command:

huffc contracts/BundleExecutor.huff -b -e paris > bytecode/BundleExecutor-huff.hex

Later, you can use this Solidity helper contract to deploy the generated bytecode e.g., to a Remix VM and get its deployed format at the selected address:

contract DeployBytecode {
    function deployBytecode(bytes memory bytecode) public returns (address) {
        address retval;
        assembly{
            mstore(0x0, bytecode)
            retval := create(0,0xa0, calldatasize())
        }
        return retval;
   }

   function bytecodeAt(address _addr) public view returns (bytes memory o_code) {
        return _addr.code;
   }
}

Here are the results of simulating the Huff version of BundleExecutor:

src/executor_huff.rs

cargo run --bin executor_huff
ETH price: $3800
WETH amount in: 100000000000000000
WETH amount out: 102769314499031232
Total WETH profit: 2769314499031232
Total USD profit: $10.52
Gas price GWEI: 15
Gas used: 131370
Gas cost: 1970550000000000
Gas cost USD: $7.49
Real WETH profit: 798764499031232
Real USD profit: $3.03

(╯°□°)╯︵ ┻━┻

So, only 20 gas units less than the pure Yul version…

But compared to Yul, Huff enables a range of low-level optimizations and tricks. One notable example is a wall of stops from the rusty-sando MEV bot.

Similar “gas golfing” is necessary if you want to compete on the Mainnet and in that case Huff might give you a competitive edge.

MEV bot gas optimizations progress

Let’s quickly recap gas usage and profit margins of different arbitrage contract versions:

Gas used Gas cost MEV profit
BundleExecutor-before.sol 156749 $8.93 $1.59
BundleExecutor-after.sol 137472 $7.84 $2.69
BundleExecutor-yulsol.sol 131470 $7.49 $3.03
BundleExecutor.yul 131390 $7.49 $3.03
BundleExecutor.huff 131370 $7.49 $3.03


We’ve increased our profit margins by almost 90%! And calculations are performed for a relatively low gas price at 15 GWEI. But it’s up to you to decide which level of additional complexity is worth it.

Summary

Thanks for sticking with me to the end. Implementing gas optimizations is a fantastic way to improve your understanding of the EVM’s inner workings. I’ve focused on the bare minimum implementation to showcase possible gas optimizations. So please remember that the presented versions of the contracts are not production-ready.



Back to index