threading - ryzom/ryzomcore GitHub Wiki

title: Threading description: published: true date: 2023-03-01T05:17:11.537Z tags: editor: markdown dateCreated: 2022-03-07T10:55:17.419Z

The NLMISC module of NeL provides a variety of tools for programmers to make their applications threaded and to ensure thread-safety. One of the key interface classes is IRunnable, which serves as the basis for most threaded logic. Any logic that needs to be executed in a thread should be created as a sub-class of IRunnable.

A new thread can be launched through IThread. The create method is used to convert a IRunnable object into a thread. Once a thread object is created, it can be manipulated using various methods such as start, terminate, wait, and isRunning.

The most commonly used mutex class in NeL is CMutex. Using the CAutoMutex template for locking the mutex ensures that it is always correctly unlocked, even in the case of exceptions being thrown within the lock. The CUnlockableAutoMutex template provides the option to leave the mutex early.

Synchronized object templates are the easiest and most reliable way to make a specific variable thread-safe. By wrapping a variable in a synchronized object template, any access to the variable is forced to be thread-safe. The CSynchronized template is the most frequently used synchronized object template in NeL.

CAtomicFlag, CAtomicInt, and CAtomicEnum are classes for atomic operations on flags, integers, and enumeration types. CAtomicLockSpin and CAtomicLockYield are spin locks used to synchronize access to shared data in a multithreaded program.

Threads

IRunnable

The interface IRunnable is a basic interface that serves as the basis for most threaded logic. If you have any logic that you want executed you will create a sub-class of IRunnable. Here is the IRunnable interface:

class IRunnable
{
public:
	// Called when a thread is run.
	virtual void run()=0;
	virtual ~IRunnable() { }
	virtual void getName(std::string &result) const { result = "NoName"; }
};

You can see that the interface is fairly simple. All you're required to do is implement the run method with your logic. You don't have to implement the getName method but it is highly advisable that you do for troubleshooting. You will soon have logs filled with a thread name of "NoName" and a thread ID that outside of a debugger makes no sense. Below is an example of some logic created within IRunnable that can be threaded.

class HelloLoopThread : public IRunnable
{
	void run()
	{
		while(true)
			printf("Hello World!\n");
	}

	void getName(std::string &result) const
	{
		result = "HelloLoopThread";
	}
};

Thread Management

Simply having an instance of an IRunnable object does you no good - you must have some way to create a thread and manage that thread. This is where the IThread class comes into play. The IThread class provides an interface to store an instance of a thread that you will manage but also provides a couple useful static methods. We'll start with the static methods: create and getCurrentThread. Both of these static methods return an instance of an IThread object. The method getCurrentThread will provide you with the instance of the thread your code is currently executing in, but the most important method is the static create method.

The create method is the method you use to convert IRunnable objects into threads. It takes an IRunnable object and the stack size. If the stack size you provide is 0, then it will use the default stack size as determined by the OS. Only provide a stack size if you have a specific reason for doing so, in nearly all scenarios providing only the runnable instance is sufficient.

Once a thread object is created it isn't automatically running. The start method is used to start the thread, after which the isRunning method can be used to check whether the thread is still running.

In some cases, it may be necessary to stop a thread before it reaches its natural exit point. The terminate method can be used to kill the thread, but this is a risky method that should only be used in extreme cases. The wait method is used to block the current thread and wait for the specified thread to exit, before you can delete the thread object.

Assuming that you have created the runnable from the previous example, here is an example of how you would manipulate the HelloLoopThread:

void threadManager()
{
	IThread *helloThread = IThread::create(new HelloLoopThread());
	helloThread->start();

	nlSleep(100);
	if(helloThread->isRunning()) // assume it's stick in an infinite loop.
		helloThread->terminate();
}

Task Manager

The CTaskManager class is a useful tool for managing tasks in a multithreaded environment. It allows you to create a list of tasks that need to be executed, and it will execute them in priority order. You can see an example of this at work in the CAsyncFileManager class. This class is a perfect example of implementing performance critical components of a game in a thread-safe way. The task manager is implemented as a single thread that continuously checks the task queue and executes any available tasks.

To use the task manager, you first create an instance of the CTaskManager class. You can then add tasks to the manager using the addTask() method. The method takes an IRunnable object, which defines the task to be executed, and a priority level, which can be used to prioritize the tasks.

Once you have added tasks to the manager, you can start the manager by calling the run() method. This method starts the manager's internal thread, which begins executing tasks from the task queue. The manager's internal thread will continue to execute tasks until there are no more tasks left in the queue or until the thread is stopped. You can add more tasks to the task manager after calling the run() function.

To stop the manager, you can call the deleteTask() method to remove any remaining tasks from the queue. Once the task queue is empty, the manager will automatically stop its internal thread.

The CTaskManager class also provides a number of utility methods that can be used to manage tasks. You can use the getNumWaitingTasks() method to determine the number of tasks waiting in the queue. The dump() method of the CTaskManager class is a useful debugging tool that allows you to inspect the list of tasks waiting to be executed.

Finally, the CTaskManager class also provides a callback mechanism for changing the priority of a task. This can be useful if you need to dynamically adjust the priority of tasks based on external factors. To use the callback mechanism, you first register an instance of the IChangeTaskPriority class with the task manager using the registerTaskPriorityCallback() method. The task manager will then call the getTaskPriority() method of the registered callback object to determine the priority of each task before executing it.

Synchronization

Mutexes

A mutex is a synchronization object used to protect a shared resource from simultaneous access by multiple threads. NeL provides several mutex types, but in most cases, CMutex will suffice.

There are a couple of things to keep in mind when using NeL's mutex implementations. Never attempt to share a mutex across processes, and do not recursively enter the same mutex in succession within the same thread. Each enter() must be followed by a leave() before calling enter() again.

The mutex classes available in NeL are:

CMutex: Wraps a std::mutex under C++14, and either a CFairMutex or CUnfairMutex on legacy build targets.
CFairMutex: A fair mutex that ensures threads are served in the order they requested the lock.
CUnfairMutex: A classic mutex that is not fair and may exhibit thread starvation.
CFastMutex: An unfair mutex that with low-level optimizations for speed.
CSharedMutex: A mutex that can be shared between processes without explicitly sharing the handle. This is an advanced feature.

Using a mutex is simple. You first create an instance of the mutex, and then use the enter() and leave() methods to lock and unlock it, respectively. Here's a brief example to illustrate:

class CMyClass
{
 	// Create a mutex
	CMutex m_Mutex;
  
public:
	void setValue(uint32 val)
	{
		// Enter the mutex
		m_Mutex.enter();

		// Work with the values you want to be threadsafe
		m_MyVal = val;

		// Exit the mutex
		m_Mutex.leave();
	}
};

However, manually using enter() and leave() can lead to deadlocks if exceptions are thrown while holding the lock. A better practice is to use CAutoMutex, which automatically unlocks the mutex when the lock goes out of scope. This has zero effect on performance and is the preferred way of using mutexes. In case you want the option to leave the mutex early, use CUnlockableAutoMutex.

class CMyClass
{
 	// Create a mutex
	CMutex m_Mutex;
  
public:
	void setValue(uint32 val)
	{
  	// Lock the mutex, it will be unlocked by the CAutoMutex destructor
		CAutoMutex<CMutex> lock(_Mutex);

		// Work with the values you want to be threadsafe
		m_MyVal = val;
    
    // Nothing here, the mutex is unlocked when it leaves scope!
	}
};

In almost all scenarios, CMutex is sufficient, but you can use CFastMutex in some special cases where speed is crucial and thread starvation is not a concern.

Synchronized Variables

So you can see above how you quickly used the low level mutex classes to make a block of code thread-safe. But there's a much easier and more reliable way of making a specific variable thread-safe. The synchronized objects templates are the most common way of implementing thread safety in NeL. By wrapping your variable in the provided templates you effectively force any access to the variable to be thread-safe. There are two main synchronized object templates:

CSynchronized: This uses the default mutex.
CFairSynchronized: This is uses a fair mutex.
CUnfairSynchronized: This uses an unfair mutex.

Of the two templates the most frequently used one is CSynchronized. CUnfairSynchronized is only used in cases where performance is of a concern and thread starvation is not considered to be an issue. The following example shows how to use the CSynchronized template to protect a member:

class CMyClass
{
	CSynchronized<vector<uint32>> m_Foo;
  
public:
	// The member m_Foo will be thread-safe for the duration of the scope of this method
	void pushFooValue(uint32 val)
	{
		// Get an accessor to the member
		CSynchronized<vector<uint32>>::CAccessor access(m_Foo);
		// Set the value
		access.value().push_value(val);
	}
};

Atomics

Atomic Flag

The nel/misc/atomic.h header file defines the CAtomicFlag class, which provides an atomic flag. The flag is initialized as clear and the only supported memory orders are TMemoryOrderAcquire for testAndSet() and TMemoryOrderRelease for clear(). Higher memory orders may be used depending on the implementation.

This flag is particularly useful for spinlocks, and the class provides the following member functions:

testAndSet(): Atomically sets the flag to true and returns the previous value.
clear(): Atomically sets the flag to false.
test(): Retrieves the current value of the flag without changing it.

The class implementation varies depending on the compiler and language version being used. In general, the implementation uses either STL primitives, GCC built-ins, or the Win32 API to implement the atomic operations.

It is important to note that CAtomicFlag instances cannot be copied.

Atomic Integer

The CAtomicInt class is an atomic integer implementation that provides atomic operations for shared integer variables. It supports multiple memory orderings, with the highest supported memory orders being acquire and release.

This class provides the following atomic operations:

load(): Loads the current value of the atomic integer with the given memory order.
store(): Stores a new value in the atomic integer with the given memory order.
fetchAdd(): Adds a given value to the atomic integer and returns the previous value with the given memory order.
exchange(): atomically swaps the current value of the atomic integer with a new value and returns the previous value with the given memory order.

The class also supports the following operators:

+=, -=, ++, --: Perform arithmetic operations on the atomic integer in an atomic manner.
==, !=, <, <=, >, >=: Compare the atomic integer with a given value in a thread-safe manner, the comparison is not guaranteed to be atomic.

This class has native implementations for different platforms, and it selects the appropriate implementation based on the platform and available features. The native implementation uses platform-specific atomic operations for the best performance.

Atomic Enum

The CAtomicEnum class is a template class that provides an atomic version of an enumeration type. The implementation uses the std::atomic class under C++14 to support enums with specific types, otherwise, it uses the CAtomicInt class to provide atomic operations in a cross-platform manner.

This class has the same atomic load and store operators as CAtomicInt, and comparison operators. However, it does not provide arithmetic operators.

CAtomicEnum is intended to be used in multithreaded code where multiple threads may access the same enumeration value concurrently, to ensure thread visibility.

Atomic Spinlock

The CAtomicLockSpin and CAtomicLockYield classes are both spin locks that allow threads to synchronize access to shared data in a multithreaded program. A spin lock is a lock that causes a thread trying to acquire it to spin in a loop while waiting for the lock to become available, rather than blocking and relinquishing the CPU to another thread.

CAtomicLockSpin uses a simple spin lock mechanism to acquire the lock, repeatedly trying to set an CAtomicFlag until it is successful. CAtomicLockYield works in a similar way, but yields the thread if the lock is not available, rather than spinning in a loop.

Using spin locks can be more efficient than using other types of locks, such as mutexes, in certain situations, particularly when the critical section of code is very short and contention for the lock is expected to be low.

However, spin locks can also be less efficient than other types of locks when contention is high, as they can waste CPU cycles while spinning in a loop waiting for the lock to become available. Therefore, it is important to choose the right type of lock for your specific use case, and to consider the potential trade-offs between efficiency and simplicity of code.

Here's a basic usage example.

#include <nel/misc/thread.h>
#include <nel/misc/atomic.h>

using namespace NLMISC;

CAtomicFlag g_Flag;
int g_Counter = 0;

class MyRunnable : public IRunnable
{
public:
    virtual void run() NL_OVERRIDE
    {
        for (int i = 0; i < 1000000; ++i)
        {
            // Acquire a lock with CAtomicLockSpin
            CAtomicLockSpin lock(g_Flag);

            // Access the shared data
            ++g_Counter;
        }
    }
};

int main()
{
    MyRunnable runnable;
    IThread *t1 = IThread::create(&runnable);
    IThread *t2 = IThread::create(&runnable);

    t1->start();
    t2->start();

    t1->wait();
    t2->wait();

    delete t1;
    delete t2;

    // The value of g_Counter is expected to be 2000000
    return g_Counter;
}

Source