398. Random Pick Index - cocoder39/coco39_LC GitHub Wiki

398. Random Pick Index

note: a fb variant. one pass get the index of the max value

def pick(nums) -> int:
    count = 0
    res = -1
    max_val = -float('inf')
    
    for i, num in enumerate(nums):
        if num > max_val:
            max_val = num
            count = 1
            res = i
        elif num == max_val:
            count += 1
            if random.randint(1, count) == count:
                res = i
    
    return res

Notes 2020:

Option 1: Reservoir Sampling (Preferred)

Time complexity is O(N) for pick Space complexity is O(1) (except for storing the initial array. In practice reservoir sampling can deal with stream so there is no need to store the data set)

class Solution:

    def __init__(self, nums: List[int]):
        self.nums = nums

    def pick(self, target: int) -> int:
        res = -1;
        count = 0
        for i, num in enumerate(self.nums):
            if num == target:
                count += 1
                if random.randint(1, count) == count:
                    res = i
        return res
  • When we encounter the 1st occurrence of target at some index i1, count is 1. The probability of picking this index is: 1/1 = 1
  • When we encounter the 2nd occurrence at some index i2, count becomes 2. The probability of replacing the previously picked index i1 with i2 is: 1/2; the probability of i1 remains selected is 1-1/2=1/2
  • When we encounter the 3rd occurrence at some index i3, count becomes 3. The probability of replacing the previously picked index with i3 is: 1/3; the probability of i1 remains selected is (1-1/3) * 1/2 = 1/3; the probability of i2 remains selected is (1-1/3) * 1/2 = 1/3
  • for current index, the p to select it is 1/count
  • for any previous index, the p to not select current index is 1 - 1/count and the p to select the previous index is 1/(count-1). The final p of still selecting the previous index is 1/count

Option 2: hash map

T = O(n) for initialization and O(1) for pick. S = O(n) build the map.

class Solution:

    def __init__(self, nums: List[int]):
        self.num_to_index = collections.defaultdict(list)
        for i, num in enumerate(nums):
            self.num_to_index[num].append(i)

    def pick(self, target: int) -> int:
        sz = len(self.num_to_index[target])
        index = random.randrange(0, sz)
        return self.num_to_index[target][index]

=============================================================================

Reservoir sampling is used to sample k items from a total of n items, where n can be very large such that we cannot stored all data in memory or the input is a stream whose length is n.

(*
  S has items to sample, R will contain the result
 *)
ReservoirSample(S[1..n], R[1..k])
  // fill the reservoir array
  for i = 1 to k
      R[i] := S[i]

  // replace elements with gradually decreasing probability
  for i = k+1 to n
    j := random(1, i)   // important: inclusive range
    if j <= k
        R[j] := S[i]

suppose p_i is the probability that an item within arr[0 .. i-1] in picked at round i, and suppose p_(i-1) = k/(i-1).

  • for old item (i.e., arr[0 ... i-1]), p = k/(i-1) * (i-1)/i = k/i
  • for the new item (i.e., arr[i]), p = k/i

In this problem, k = 1. Using hash table to store would take more memory than using array, leading to memory limited error on OJ

class Solution {
public:
    Solution(vector<int> nums) {
        this->nums = nums;
    }
    
    int pick(int target) {
        int res = -1; //guaranteed to be assigned to 0 when i == 0
        int cnt = 0;
        for (int i = 0; i < nums.size(); i++) {
            if (nums[i] == target) {
                cnt++;
                if (rand() % cnt == 0) {
                    res = i; 
                }
            }
        }
        return res;
    }
private:
    vector<int> nums;
};
⚠️ **GitHub.com Fallback** ⚠️