4.Adding a GenServer Part 2 - abarr/remote GitHub Wiki

After ensuring I have a working test for the GenServer, it is back to the requirements.

Requirments

As it stands, the updated list looks like this

  1. :remote app launches GenServer when the application starts - DONE
  2. Holds state %{max_number: :integer, timestamp: :etc_datetime} - DONE
  • :timestamp starts as nil and is updated each time someone queries the GenServer - DONE
  • :timestamp represents the last time someone queried the GenServer - TODO
  1. Every 60 seconds, the GenServer runs a job - DONE
  • The GenServer updates every user's points value with a random value between 0..100 - TODO
  • The Genserver updates :max_number in the state to a new random number between 0..100 - DONE
  1. The GenServer accepts a call to return Users with points greater than :max_number with a limit of 2 and the :timestamp from the previous call. - TODO

Given the second part of the second requirement depends on requirement number 3, I will come back to it after implementing the update.

Updating 1,000,000 Rows

I tested several options.

  1. I used Ecto.Repo.update_all/2
$ iex -S mix
Erlang/OTP 24 [erts-12.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]

Interactive Elixir (1.13.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Remote.Repo.update_all(Remote.Users.User, set: [points: Enum.random(0..100), updated_at: DateTime.utc_now()])
[debug] QUERY OK source="users" db=3870.1ms decode=0.6ms queue=0.1ms idle=278.6ms
UPDATE "users" AS u0 SET "points" = $1, "updated_at" = $2 [67, ~U[2022-01-10 10:17:03Z]]
{1000000, nil}

Unfortunately, all rows were updated to the same random number. So it was not an option to use Ecto.Repo.update_all/2 without adding additional looping to generate unique random numbers.

  1. Rather than use Enum, I tried using raw SQL as a fragment generating the number using the Postgres random function. I started by working out how to get a random :integer

# random function - random number 0.0 to 1.0
SELECT random()
0.9474413789530018

# random function with floor - rounds random number
SELECT floor(random())
0

# random function with floor and multiplied by the high number - low number
select floor(random() * (100 - 0))
44 

# But if you try it with a low number other than 0, it can fail
select floor(random() * (100 - 20)) 
11

# So to account for variable ranges add on the low value
select floor(random() * (100 - 20)) + 20 
52

# This actually returns type of `double precision` so it is also important to cast the value
select cast(floor(random() * (100 - 20)) + 20 AS integer) 
68
 

with a SQL fragment to generate a random :integer, I then tested the code

update(User,
  set: [
    points:
      fragment(
        "cast(floor(random() * (? - ?)) + ? AS integer) ",
        type(^max_value, :integer),
        type(^min_value, :integer),
        type(^min_value, :integer)
      ),
    updated_at: fragment("now() at time zone 'utc'")
  ]
)
|> Repo.update_all([])

This worked fine but still took between 5 and 9 secs.

  1. Create a new table, copying the Users table, updating the new table and then DROP User and ALTER new table

CREATE TABLE updated_users AS
	SELECT * FROM users;

UPDATE updated_users 
	SET points = floor(random() * (100 + 1)), updated_at = now() at time zone 'utc';
	
DROP TABLE IF EXISTS users;
ALTER TABLE updated_users RENAME TO users;

# Results

SELECT 1000000
UPDATE 1000000
DROP TABLE
ALTER TABLE
5.1s

This consistently took more than 5s; running each query separately reduced the ALTER TABLE to 18ms, but it was still long enough that a call to query the table while the new table was being copied and then updated would result in a timeout.

  1. Combine the CREATE UPDATE queries to reduce the time the rows are locked

The default timeout when calling a GenServer.call is 5 secs, so I was aiming for something that consistently updated 1,000,000 rows in less than half that time. I rewrote my query:


CREATE TABLE updated_users
(
    id,
    points,
    inserted_at,
    updated_at
) AS 
  SELECT id, cast(floor(random() * (100 - 0)) + 0 AS integer) , inserted_at, now() at time zone 'utc' 
  FROM users;

# Result 
SELECT 1000000

2.2s

By combining this with a DROP and ALTER query, I updated 1,000,000 rows with new points random :integer and updated_at with eth current utc timestamp in just 2 secs.

I created two functions to be used by the handle_info


def update_all_users_points(max, min) do
  Enum.each(update_sql(min, max), fn sql ->
    Ecto.Adapters.SQL.query(Repo, sql)
  end)
end

defp update_sql(min, max) do
  [
   """
   CREATE TABLE updated_users (id, points, inserted_at, updated_at) AS
    SELECT id, cast(floor(random() * (#{max} - #{min})) + #{min} AS integer) , inserted_at, now() at time zone 'utc' 
    FROM users;
   """,
   """
   DROP TABLE users;
   """,
   """
   ALTER TABLE updated_users RENAME TO users;
   """
  ]
end