<< Part IV | This is Part V of a multi-part series on Bitcoin block propagation with IBLT.
IBLT part V: Simulating reality
Rusty Russell has dumped mempool data from 4 different nodes around the globe. It covers a week’s worth of data. The purpose of the dumps was to get an idea on how similar the mempools are. The effectiveness of IBLTs is highly dependent on the similarity of mempools. The data covers blocks at height 352305 thru 353025. For every block and node, it will list:
- mempool-only: Transactions that are not in the block, but present in the mempool
- plus-transactions: Transactions present in the block, but not in the mempool
- zero-transactions: Transactions present both in the block and in the mempool
From this we can pretend that a node is receiving a block and that the plus-transactions are exactly the ones listed in the data. We can’t say from this data alone what the set of minus-transactions (transactions that the receiver removes from the IBLT, but wasn’t present in the IBLT) would be, but we assume that there are equally many minus-transactions as there are plus-transactions. The minus-transactions will be selected randomly from the mempool-only set.
Now, we use this as input to determine the IBLT sizes for several different failure probabilities:
Some stats from the data set:
Average block size [Bytes]: 381891
Average tx count per block: 715
Sample count : 722 (One fork included)
This tells us that with an IBLT of size 21600 bytes, we could have successfully transferred 95% of the blocks to the Australian node. That’s about 5.7% of the average block size. Also if we want to go super-small and if we can put up with 8% failure probability, we can use an IBLT of 5760 bytes or about 1.5% of the average block size
The result above is just a first step. There are several optimizations to be made to further shrink the IBLT:
- Added set: A set of hash prefixes that the sender can add to the message to signal “These transactions were added in spite of my fee hint”
- Removed set: A set of hash prefixes meaning “These transactions were excluded from the block in spite of my fee hint”
- Literal set: A set of transactions that the sender thinks is unlikely to be present in the receiver’s mempool. The coinbase will be here, but probably also transactions that the sender didn’t have when receiving the block.
In part VI of this series I’ll show you results from scaling up the corpus data 10x, 100x and 1000x. Or if we have results from the above mentioned optimizations, I may decide to write about that instead. Stay tuned!
Yay, I just replicated your results!
In particular, using 337 buckets (slices of 64 bytes), across blocks 352306 to 353009 inclusive in the corpus, simulating the case where the first peer to see a block broadcasts it to the other three peers, they successfully decode 1987 of 2112 blocks (ie. 94%).
This uses the fee hint and added/removed sets. The total data sent for these (including all overheads, such as block header and coinbase) is 52,953,882 bytes. A raw send of this data would be 813,371,931 bytes, so we’re at 6.5%!
Cool!
Why did you exclude block 352305 and 353010-353025?
The average data sent per block and peer would be 52953882/2112/3=25072 for 6% failure probability.
I do think it’s weird that you don’t get a lower failure probability since you’re using fee hint and added/removed sets.
I will set up the test with fee hint and added/removed sets as well to see what I get.
> Why did you exclude block 352305 and 353010-353025?
352305 was an out-by-one error on my part, and I got weird results after 353010 (I think there’s another orphan in there, but I ran out of time). This is the weak-block/ github data.
> I do think it’s weird that you don’t get a lower failure probability since you’re using fee hint and added/removed sets.
That is weird; but our methodology is slightly different. I am taking the first peer which sees the block, and encoding it as that peer to each of the others. You use random minus transactions, so I’m just happy we’re in the same ballpark…