Verification of computation - golemfactory/golem-rd GitHub Wiki

See also Verification

Verification of computation is tricky since it involves cheating, software failures and possible nondeterminism. Provider may want to cheat by providing junk and requestor may want to cheat by not paying or paying way lower than agreed. When provider and requestor exchange work for money, there is a problem of who delivers what first. If provider will do computation and give requestor output first, he risks being cheated by requestor who will choose to not pay. Conversely, if requestor pays first, provider may choose to cheat and skip the computation.

Verification after result delivery

If requestor has plaintext of output, it can check it in multiple ways. This check might itself be non-trivial, but at least it is doable. Possible ways include sampling (e.g. computing few pixels in rendered scene), full local replication, correctness check (it is easy to check graph coloring but it is hard to find graph coloring), etc. Problem is - requestor may just choose not to pay since he already has the result. Up-front payment is also a possibility, but in such situation requestor has no guarantee that he will get any result at all.

Verification before result delivery

Ideal solution to this problem is a protocol which allows to atomically swap results of computation for payment. Let’s assume that we have such protocol (Atomic swap). Now, we’ve established that results of computation can’t be revealed before the swap. How can validation be done in such situation?

Local replication

Requestor can do computation himself and compare a hash of output of his computation to hash of output of provider’s computation. To make this efficient, requestor should do it just for 1 of n subtasks. This means that cheating provider will have 1/n chance of getting caught. Is cheating worth it? Chance is low, but since Golem market is likely to be oligopsony (“few buyers”), being blacklisted by a requestor might be a huge deal. Unless it’s cheap to create new identity.

Here is a protocol build around combination of local verification and lottery

Verification being done by requestor’s machine puts hardware boundaries on tasks that can be checked by requestor and also limits amount of subtasks requestor can check, leading to situation where probability of task being verified is inversely proportional to speed-up that requestor gets by using Golem. Which is a severe limitation.

Replication by network

To avoid this limitation we need find a way for providers to crosscheck each other.

The simplest way - give two providers same subtask, ask them to publish the hashes of output, compare. If hashes are identical, providers were diligent and you can move on to the swapping of output for money.

Problems:

requestor does not really have to pay two people for the same good
collusion of providers

Why pay for the same data twice?

But why would you ever want to pay twice for the same output? Problem is - it’s impossible to prevent two parties from dealing with each other behind the back of the third party. So instead of thinking how to prevent possibility of such deal, we attempt to embrace it. Bitcoin miners dutifully perform computations even if they don’t guaranteed to make money on each block. They do it because they know that in the long run scheme is profitable and it is fair. What can we do to replicate this properties. First, we need to be sure that providers does not know which of the subtasks are replicas and which are unique. Otherwise they may just bail if they will know that they are computing replicas and the chance of them being paid is below 100%. This can be done by taking subtask+salt and encrypting it (let’s call it blinded input). Next, when provider has provided the network his signed commitment to compute particular blinded input, we can unblind it. This prevents providers from picking and choosing what they will compute in single round. But this still does not give them expectation of income, since they does not know that percentage of subtasks is replicas. Fortunately we can include this information into a description of task, published and signed by requestor. Next it would be possible to check if requestor was lying.

This scheme still can be manipulated by requestor by picking whom of providers will compute what. Since we want protocol to be fair, let’s create a list of blinded inputs, next all parties (requestor plus providers) together generate a random number. This random number can be used to determine who will compute what.

Fast forward, all the providers has completed computations, and two of them discover that they were computing same subtask. Which one of them will upload the result and receive money? If we leave this decision to requestor, we give him a lot of bargaining power: “give me the result for half of the price or I will buy it from other guy”. This makes whole process unfair and will result in reduced interest from potential providers. To prevent it, we first reveal who were doing replicated work and next - generate one more random number and use it to determine whom of providers who were doing replicated work will get paid and who will not.

What conditions should this number satisfy? First - it should be fair (cannot be manipulated by any of the interested parties). Second - it should be unknown to participants while they are computing subtasks and become known when mapping between subtasks and inputs is known.

To solve this problem we need to have a way to [generate random numbers as a network](Random number generation by network).

What happens if requestor is also some of the providers? Can he exploit this fact to freeload? Yes, although it is not very likely under normal conditions. After work assignment process is completed, requestor knows which nodes ultimately will not get paid since he knows the preimage. If happens that he controls all of those nodes, he might choose to not pay and still be able to buy all of inputs he is interested in. This might be a big hit to his reputation, but because of strong economic position of buyers in Golem market (= oligopsony), such idea might still be entertained by buyers.

How can this protocol fail? One mode - two providers compute same subtask but publish different hashes. What has gone wrong? There are three possibilities: software failure, hardware failure, collusion. Faulty software (i.e. non-deterministic despite efforts to make computational environment fully deterministic) is detectable by elevated failure rates for particular application. Faulty hardware is a problem of provider - he should just fix it or it will not get paid. Same goes for attempts to skip computation and return junk instead of results. There is one more way.

Collusion problem

Two providers may want to collude. They can exchange information about inputs they are getting and if it will turn out that they are computing same subtask to choose to skip computation and return identical junk. This problem might be serious.