A simple model for network data size and disk space usage of a fediverse instance - wimvanderbauwhede/limited-systems GitHub Wiki

A simple model for network data size and disk space usage of a fediverse instance

I wondered how much resources a fedi instance would consume, so I made a simple model.

tl;dr: I put it in a spreadsheet on Ethercalc, and there is an off-line version in .ods format. If you don't have a spreadsheet program that can read this format, you can drop it on Ethercalc.

But I do hope you'll read the post to see the assumptions behind the model.

Assumptions

  • Number of active users: n_users
  • Average number of posts per user per day: n_posts
  • Average post size (in kB): post_sz_in, post_sz_out (inbound and outbound traffic, see below)
  • Average number of followe{r,d}s:
    • Average number of external followe{r,d}s (i.e. on different instances): n_follows_ext
    • Average number of local followe{r,d}s (i.e. on your instance): n_followsollows_loc < n_users
    • Total average number of followe{r,d}s: n_follows

So I assume every user has external follower parity.

Network data size

In the worst case, every message a user posts is received by all users of the instance (local timeline) and by all external followers of the user.

Now, we don't know the total number of unique external followers, but we can estimate this as follows:

  • If every user had the same external followers, there would be n_follows
  • If every user had different external followers, there would be n_users*n_follows

These are the upper and lower bounds, so the actual number will be in between. A reasonable approximation is to use the geometric mean, so

sqrt(n_follows*(n_users*n_follows)) = sqrt(n_users)*n_follows 

as an approximation of the total number of unique external follwers.

Outbound traffic

On a daily basis, the outbound traffic would be the posts of all users received by all other users and all external followers of each user. All local users is a worst case, if every user is having the local timeline loading all the time.

data_sz_out_worst = n_users*(n_users + n_follows_ext)*n_posts*post_sz_out

A less pessimistic estimate is to use all followers, i.e. users are looking at their home timeline. In that case, we have

data_sz_out = n_users*n_follows*n_posts*post_sz_out

Inbound traffic

The inbound traffic is sum of all posts by all users and the posts by all external followers, so that would be:

data_sz_in = (n_users + sqrt(n_users)*n_follows_ext)*n_posts*post_sz_in

In other words, the inbound traffic is smaller than the outbound traffic because 1+n_follows_ext/sqrt(n_users) is always smaller than n_follows.

Disk space

The server will store all unique posts, so that is all posts by the instance users and those by the external followeds.

 disk_space = (n_users + n_follows)*n_posts*post_sz_out

Illustration for typical instance sizes

To see what this tells us, let's put some numbers on this. Let's assume every user posts about 2MB of data per day (e.g. 4 pictures of 500kB). Then the incoming traffic per user is:

n_posts*post_sz_in = 2MB/day

However, the outgoing traffic is lower because the images are reduced to thumbnails of around 50kB on Mastodon, but not on Pleroma, so

n_posts*post_sz_out = 2MB (Pleroma)
n_posts*post_sz_out = 200kB (Mastodon)

This would include text posts as they are on average only about 100B to 200B.

Let's assume a user has on average 100 external followers/followeds.

n_follows_ext = 100

Now let's assume a few differnent instance sizes:

Single user

Say you want to host from home on a Raspberry Pi

data_sz_out = data_sz_in = disk_space = (1+100)*2 = 202MB/day or 6GB/month or 72GB/year. 

This would be well below the typical data allowance of most service providers. The disk space requirement means you'd need a 64GB to 128GB micro-SD card per year. So this is not a problem.

10 active users

Data size

  • Inbound:

      data_sz_in = (10 + 3*100)*2 = 652MB/day, or 20GB/month
    
  • Outbound:

      data_sz_out = 10*(10 + 100)*{.2,2}MB = {.22,2.2}GB/day or {6.6,66} GB/month
    

That is still OK for most people's allowance. My provider doesn't allow more than 50 GB/month, my current plan is 15GB so I could probably host such an instance on my home internet connection.

Disk space

disk_space = (10+100)*2 = 220MB/day, 6.6GB/month

So quite similar to the single-user case because the number of followers is larger than the number of users.

100 active users

Data size

  • Inbound:

      data_sz_in = (100+10*100)*2 = 2.2GB/day
    
  • Outbound:

    The outbound data size becomes seriously large:

      data_sz_out = 100*(100+100)*{0.2,2} = {4GB,40GB}/day 
    

Disk space

The disk space however only doubles:

disk_space = (100+100)*2 = 200MB/day

1000 active users

Data size

Things get even more demanding with a 1000 active users:

  • Inbound:

      data_sz_in = (1000+33*100)*2 = 8.3GB/day
    
  • Outbound:

      data_sz_out = 1000*(100+100)*{.2,2} = {40,400}GB/day
    

It's worth nothing that sending out 400GB/day requires a network Data size of 40Mb/s. Here in the UK the average is 40Mb/s for a home connecting, but this is incoming. So hosting a 1000-user instance from your home internet is not a good idea.

Disk space

And here the disk space becomes a bit more of an issue too:

disk_space = (1000+100)*2 = 1GB/day, so about 360GB/year

Conclusion

The main issue with hosting a fediverse instance is the amount of data it sends on the network, which grows almost as the square of the number of users. This is because essentially every message of a single user goes to all users. Disk space grows with the number of users and is generally speaking not an issue.