ftlHashSet - SCM-NV/ftl GitHub Wiki
ftlHashSet is a container for a set of unique values. Search, insertion, and removal of values have average constant-time complexity. It basically represents a set in the mathematical sense.
Internally, the values are not sorted in any particular order, but organized
into buckets. Which bucket an element is placed into depends entirely on the
hash of its value. This allows fast access to individual elements, since once
the hash is computed, it refers to the exact bucket the element is placed into.
The value types have to implement the ftlHash
method and be comparable with
the ==
operator. Hash functions for the basic Fortran types are predefined
in the ftlHash module and can be combined to implement hash functions for
any derived types.
ftlHashSet basically a Fortran reimplementation of C++'s std::unordered_set with a slightly different (and more intuitive) interface.
ftlHashSet uses the following macros for instantiation:
FTL_TEMPLATE_TYPE
- The type of the objects to be stored in the container, e.g.
integer(int64)
orMyDerivedType
. Do not add the enclosingtype()
for derived types. FTL_TEMPLATE_TYPE_IS_DERIVED
- This needs to be defined if
FTL_TEMPLATE_TYPE
is a derived type. If the type is derived, it must implement theftlHash
function and be comparable with the==
operator. FTL_TEMPLATE_TYPE_MODULE
- The name of the module in which
FTL_TEMPLATE_TYPE
is defined. Specifying this is probably only necessary ifFTL_TEMPLATE_TYPE
is a derived type, or uses a kind defined in another module, e.g.int64
from theiso_fortran_env
module. FTL_TEMPLATE_TYPE_NAME
- A convenient user-readable name for the stored type, e.g.
Int
. This will be used in the type-name of the dynArray container itself and the module in which it is available, e.g. setting this toInt
will result in the typeftlHashSetInt
available in moduleftlHashSetIntModule
. FTL_INSTANTIATE_TEMPLATE
- If this macro is defined the template will be instantiated. If not, the template will not be instantiated, but some information about the ftlHashSet container (e.g. its name and the kind of iterator it provides) will be passed on to subsequent template instantiations (e.g. ftlAlgorithms).
There is a specialization of the ftlHashSet template for ftlString as the
element type. It is accessed by defining FTL_TEMPLATE_TYPE_IS_FTLSTRING
. Its
sole purpose is to add an alternative version for all methods that accept an
element as input. The alternative version accepts a plain Fortran character
variable as input instead of a ftlString
. This in convenient when passing a
string literal. Of course the normal versions accepting an ftlString
are
also available.
type(ftlHashSetString) :: set
call set%Insert(ftlString('inconvenient'))
! ... is inconvenient compared to:
call set%Insert('convenient')
! ... but both versions are totally equivalent
If FTL_TEMPLATE_TYPE_IS_FTLSTRING
is defined, it is not needed to define
either FTL_TEMPLATE_TYPE_MODULE
or FTL_TEMPLATE_TYPE_IS_DERIVED
. (Since
we know that the keytype is exactly ftlString
, we already know the module in
which it is defined and that it is a derived type.) Note however, that
FTL_TEMPLATE_TYPE_NAME
should still be defined.
Example instantiation of a set of integers. The resulting type ftlHashSetInt
is available in ftlHashSetIntModule
.
#define FTL_TEMPLATE_TYPE integer
#define FTL_TEMPLATE_TYPE_NAME Int
#define FTL_INSTANTIATE_TEMPLATE
#include <ftlHashSet.F90_template>
Constructs a new ftlHashSet container. There a two versions:
Constructs an empty container with a specified number of buckets.
subroutine New(self, nBuckets) type(ftlHashSetT), intent(out) :: self integer , intent(in) :: nBucketsFor best performance, the initial number of buckets should be chosen to be slightly larger than the expected number of elements to be stored in the container. If this information is not available, it should be a small number as the number of buckets will grow dynamically as more elements are stored in the container.
Constructs an ftlHashSet as a copy of another ftlHashSet.
subroutine New(self, other) type(ftlHashSetT), intent(out) :: self type(ftlHashSetT), intent(in) :: self
Destructs the container. The used storage is deallocated.
subroutine Delete(self) type(ftlHashSetT), intent(inout) :: self
Assignment of one ftlHashSet to another is exactly the same as copy-construction, see above.
subroutine assignment(=)(self, other) type(ftlHashSetT), intent(out) :: self type(ftlHashSetT), intent(in) :: other
WARNING: Like all iterators, iterators of the ftlHashSet container provide a
it%value
member, that can be used to access the value pointer to be the
iterator. While it is technically possible to change it%value
by assigning
to it or passing it into subroutines writing to it, this must never be done,
as it would leave the set in an inconsistent state if the new value should be
assigned to a different bucket. Use the %value
field of ftlHashSet iterators
strictly read-only! Note that the only ftlAlgorithms that are safe to run
on an ftlHashSet are the "non-modifying sequence operations". (Luckily it also
doesn't make much sense to run any of the other algorithms ...)
Returns a random access iterator to the first element of the container.
ftlHashSetTIterator function Begin(self) type(ftlHashSetT), intent(in) :: selfNotice that an ftlHashSet object makes no guarantees on which specific element is considered its first element. But, in any case, the range that goes from its begin to its end covers all the elements in the container.
If the container is empty, the returned iterator will be equal to
End()
. Note that this method exists as both type-bound and as a free function.
Returns a random access iterator to the element following the last element of the container. This element acts as a placeholder; attempting to access it results in undefined behavior.
ftlHashSetTIterator function End(self) type(ftlHashSetT), intent(in) :: selfNotice that an ftlHashSet object makes no guarantees on which order its elements follow. But, in any case, the range that goes from its begin to its end covers all the elements in the container.
Note that this method exists as both type-bound and as a free function.
Returns a logical value indicating whether the set is empty, i.e. whether its size is 0.
logical function Empty(self) type(ftlHashSetT), intent(in) :: selfThis function does not modify the content of the container in any way. To clear the contents, member function
ftlHashSet%Clear()
exists.
Returns the number of entries in the ftlHashSet container.
integer function Size(self) type(ftlHashSetT), intent(in) :: self
Inserts a value into the set. If the value is already in the set, the container remains unchanged.
subroutine Set(self, key, value) type(ftlHashSetT), intent(inout) :: self T , intent(in) :: valueInsertion into the set is a constant time operation, except for the case where the insertion causes the maximum load factor to be exceeded. In this case all N elements of the set are rehashed, which is an O(N) operation.
Checks whether a value is in the set.
logical function Has(self, value) type(ftlHashSetT), intent(in) :: self T , intent(in) :: value
Python style for checking whether a set has an entry for a particular key. Absolutely equivalent to
ftlHashSet%Has()
.logical function operator(.in.)(lhs, rhs) T , intent(in) :: lhs type(ftlHashSetT), intent(in) :: rhs
Searches the container for a value and returns an iterator to it if found, otherwise it returns an iterator to
ftlHashSet%End()
(the element past the end of the container).type(ftlHashSetTIterator) function Find(self, value) type(ftlHashSetT), intent(in) :: self T , intent(in) :: value
Removes entries from the ftlHashSet container, reducing the size of the container by the number of removed elements.
Remove the entry with a particular value. This has no effect if the set contains no such entry.
subroutine Erase(self, value) type(ftlHashSetT), intent(inout) :: self T , intent(in) :: valueRemove the entry pointed to be a particular iterator.
subroutine Erase(self, it) type(ftlHashSetT) , intent(inout) :: self type(ftlHashSetTIterator), intent(in) :: itRemoves all entries in the (half open) range delimited by the iterators [first,last).
subroutine Erase(self, first, last) type(ftlHashSetT) , intent(inout) :: self type(ftlHashSetTIterator), intent(in) :: first, last
Removes all entries from the set, leaving it with a size of zero.
subroutine Clear(self) type(ftlHashSetT), intent(inout) :: self
Returns the number of buckets in the ftlHashSet container.
The number of buckets influences directly the load factor of the container's hash table (and thus the probability of collision). The container automatically increases the number of buckets to keep the load factor below a specific threshold (its maxLoadFactor), causing a rehash each time the number of buckets needs to be increased.
integer function BucketCount(self) type(ftlHashSetT), intent(in) :: self
Returns the number of elements in bucket
n
.The number of elements in a bucket influences the time it takes to access a particular element in the bucket. The container automatically increases the number of buckets to keep the load factor (which is the average bucket size) below its maxLoadFactor.
integer function BucketSize(self, n) type(ftlHashSetT), intent(in) :: self integer , intent(in) :: n
Returns the bucket number where the element with value is (or would be) located. Buckets are numbered from 1 to BucketCount.
integer function Bucket(self, value) type(ftlHashSetT), intent(in) :: self T , intent(in) :: value
Returns the current load factor in the ftlHashSet container, that is the ratio between the number of elements in the container (its size) and the number of buckets.
real function LoadFactor(self) type(ftlHashSetT), intent(in) :: selfThe load factor influences the probability of collision in the hash table (i.e., the probability of two elements being located in the same bucket). The container automatically increases the number of buckets to keep the load factor below a specific threshold (its maxLoadFactor), causing a rehash each time an expansion is needed.
To retrieve or change this threshold, use member functions GetMaxLoadFactor and SetMaxLoadFactor:
Sets/Gets the maximum load factor (number of elements per bucket). The container automatically increases the number of buckets if the load factor exceeds this threshold.
subroutine SetMaxLoadFactor(self, ml) type(ftlHashSetT), intent(inout) :: self real , intent(in) :: ml real function GetMaxLoadFactor(self) type(ftlHashSetT), intent(in) :: self
Sets the number of buckets in the container to n, forcing a rehash of the elements in the container.
subroutine Rehash(self, nBuckets) type(ftlHashSetT), intent(inout) :: self integer , intent(in) :: nBucketsA rehash is the reconstruction of the hash table: All the elements in the container are rearranged according to their hash value into the new set of buckets. This may alter the order of iteration of elements within the container.
Rehashes are automatically performed by the container whenever its load factor is going to surpass its maxLoadFactor in an operation.
Notice that this subroutine expects the number of buckets as argument. A similar subroutine exists,
ftlHashSet%Reserve()
, that expects the number of elements in the container as argument.
Sets the number of buckets in the container (bucketCount) to the most appropriate to contain at least n elements.
subroutine Reserve(self, n) type(ftlHashSetT), intent(inout) :: self integer , intent(in) :: nIf n is greater than the current bucketCount multiplied by the maxLoadFactor, the container's bucketCount is increased and a rehash is forced.
If n is lower than that, the subroutine has no effect.