Binning - westpa/westpa GitHub Wiki

The Weighted Ensemble method enhances sampling by partitioning the space defined by the progress coordinates into non-overlapping bins. WESTPA provides a number of pre-defined types of bins that the user can specify within the west.cfg file, which are detailed below. WESTPA 2.0 includes custom ways (adaptive binning and "binless" schemes) to group trajectories for splitting and merging.

Note that when setting up bin boundaries in your west.cfg file, make sure to include a zero lower bound even if your initial walkers are not near zero. More information is available on this mailing list post.

Table of Contents Overview Available Bin Mappers RectilinearBinMapper PiecewiseBinMapper RecursiveBinMapper FuncBinMapper VectorizingFuncBinMapper VoronoiBinMapper Where to Specify The Minimal Adaptive Binning Scheme MAB Binning in One Direction How to "Skip" Splitting in a MAB Binning Dimension How to Print MAB Info in the west.log File How to Print MAB Bins into binbounds.log File Binless Schemes

Overview

WE simulations involve the division of configurational space, which typically involves bins. Binning, or any scheme that groups walkers in configurational space, guides the WE algorithm to split walkers with different progress coordinate values and merge walkers with similar progress coordinate values. Users can define a rectilinear binning scheme in one, two, of three-dimensional space. It is also possible to devise nested binning schemes in which one binning scheme is placed inside of another binning scheme. Both nested binning and adaptive Voronoi binning can focus sampling at specific intersections of high-dimensional phase space. A good binning scheme promotes an even distribution of walkers per bin along the progress coordinate. Bin boundaries should be close enough that walkers can transition to empty bins, but far enough apart that walkers cannot vacate the bin too quickly. In addition, a good binning scheme should be effective in sampling a transition of interest, which usually involves placing bins more finely along the barriers in progress coordinate space.

Available Bin Mappers

Users are also free to implement their own mappers. A bin mapper must implement, at least, an assign(coords, mask=None, output=None) method, which is responsible for mapping each of the vector of coordinate tuples coords to an integer (numpy.uint16) indicating what bin that coordinate tuple falls into. The optional mask (a numpy bool array) specifies that some coordinates are to be skipped; this is used, for instance, by the recursive (nested) bin mapper to minimize the number of calculations required to definitively assign a coordinate tuple to a bin. Similarly, the optional output must be an integer (uint16) array of the same length as coords, into which assignments are written. The assign() function must return a reference to output. (This is used to avoid allocating many temporary output arrays in complex binning scenarios.)

A user-defined bin mapper must also make an nbins property available, containing the total number of bins within the mapper.

RectilinearBinMapper

Creates an N-dimensional grid of bins. The Rectilinear bin mapper is initialized by defining a set of bin boundaries:

  self.bin_mapper = RectilinearBinMapper(boundaries)

where boundaries is a list or other iterable containing the bin boundaries along each dimension. The bin boundaries must be monotonically increasing along each dimension. It is important to note that a one-dimensional bin space must still be represented as a list of lists as in the following example:

  bounds = [-float('inf'), 0.0, 1.0, 2.0, 3.0, float('inf')]
  self.bin_mapper = RectilinearBinMapper([bounds])

A two-dimensional system might look like:

  boundaries = [(-1,-0.5,0,0.5,1), (-1,-0.5,0,0.5,1)]
  self.bin_mapper = RectilinearBinMapper(boundaries)

where the first tuple in the list defines the boundaries along the first progress coordinate, and the second tuple defines the boundaries along the second. Of course a list of arbitrary dimensions can be defined to create an N-dimensional grid discretizing the progress coordinate space.

PiecewiseBinMapper

For using a set of boolean-valued functions, one per bin, to determine assignments. This is likely to be much slower than a FuncBinMapper or VectorizingFuncBinMapper equipped with an appropriate function, and its use is discouraged.

RecursiveBinMapper

The RecursiveBinMapper is used for assembling more complex bin spaces from simpler components and nesting one set of bins within another. It is initialized as:

self.bin_mapper = RecursiveBinMapper(base_mapper, start_index=0)

The base_mapper is an instance of one of the other bin mappers, and start_index is an (optional) offset for indexing the bins. Starting with the base_mapper, additional bins can be nested into it using the add_mapper(mapper, replaces_bin_at). This method will replace the bin containing the coordinate tuple replaces_bin_at with the mapper specified by mapper.

As a simple example consider a bin space in which the base_mapper assigns a segment with progress coordinate with values <1 into one bin and >= 1 into another. Within the former bin, we will nest a second mapper which partitions progress coordinate space into one bin for progress coordinate values <0.5 and another for progress coordinates with values >=0.5. The bin space would look like the following with corresponding code:

'''         
             0                            1                      2
             +----------------------------+----------------------+
             |            0.5             |                      |
             | +-----------+------------+ |                      |
             | |           |            | |                      |
             | |     1     |     2      | |          0           |
             | |           |            | |                      |
             | |           |            | |                      |
             | +-----------+------------+ |                      |	
             +---------------------------------------------------+    	
'''

def fn1(coords, mask, output):
    test = coords[:,0] < 1
    output[mask & test] = 0
    output[mask & ~test] = 1
  
def fn2(coords, mask, output):
    test = coords[:,0] < 0.5
    output[mask & test] = 0
    output[mask & ~test] = 1

outer_mapper = FuncBinMapper(fn1,2)
inner_mapper = FuncBinMapper(fn2,2)
rmapper = RecursiveBinMapper(outer_mapper)
rmapper.add_mapper(inner_mapper, [0.5])

Examples of more complicated nesting schemes can be found in the tests for the WESTPA binning apparatus.

FuncBinMapper

A bin mapper that employs a set of user-defined function, which directly calculate bin assignments for a number of coordinate values. The function is responsible for iterating over the entire coordinate set. This is best used with C/Cython/Numba methods, or intellegently-tuned numpy-based Python functions.

The FuncBinMapper is initialized as:

  self.bin_mapper = FuncBinMapper(func, nbins, args=None, kwargs=None)

where func is the user-defined method to assign coordinates to bins, nbins is the number of bins in the partitioning space, and args and kwargs are optional positional and keyword arguments, respectively, that are passed into func when it is called.

The user-defined function should have the following form:

  def func(coords, mask, output, *args, **kwargs)
      ....

where the assignments returned in the output array, which is modified in-place.

As a contrived example, the following function would assign all segments to bin 0 if the sum of the first two progress coordinates was less than s*0.5, and to bin 1 otherwise, where s=1.5:

  def func(coords, mask, output, s):
      output[coords[:,0] + coords[:,1] < s*0.5] = 0
      output[coords[:,0] + coords[:,1] >= s*0.5] = 1
  ....
  self.bin_mapper = FuncBinMapper(func, 2, args=(1.5,))

A full example of the functional bin mapper can be referenced from the original MAB scheme code: https://github.com/westpa/user_submitted_scripts/tree/main/Adaptive_Binning/adaptive_2.0

VectorizingFuncBinMapper

Like the FuncBinMapper, the VectorizingFuncBinMapper uses a user-defined method to calculate bin assignments. They differ, however, in that while the user-defined method passed to an instance of the FuncBinMapper is responsible for iterating over all coordinate sets passed to it, the function associated with the VectorizingFuncBinMapper is evaluated once for each unmasked coordinate tuple provided. It is not responsible explicitly for iterating over multiple progress coordinate sets.

The VectorizingFuncBinMapper is initialized as:

  self.bin_mapper = VectorizingFuncBinMapper(func, nbins, args=None, kwargs=None)

The user-defined function should have the following form:

  def func(coords, *args, **kwargs)
      ....

Mirroring the simple example shown for the FuncBinMapper, the following should result in the same result for a given set of coordinates. Here segments would be assigned to bin 0 if the sum of the first two progress coordinates was less than s*0.5, and to bin 1 otherwise, where s=1.5:

  def func(coords, s):
      if coords[0] + coords[1] < s*0.5:
          return 0
      else:
          return 1
  ....
  self.bin_mapper = VectorizingFuncBinMapper(func, 2, args=(1.5,))

VoronoiBinMapper

A one-dimensional mapper which assigns a multidimensional progress coordinate to the closest center based on a distance metric. The Voronoi bin mapper is initialized with the following signature within the WESTSystem.initialize:

  self.bin_mapper = VoronoiBinMapper(dfunc, centers, dfargs=None, dfkwargs=None)

centers is a (n_centers, pcoord_ndim) shaped numpy array defining the generators of the Voronoi cells

dfunc is a method written in Python that returns an (n_centers, ) shaped array containing the distance between a single set of progress coordinates for a segment and all of the centers defining the Voronoi tessellation. It takes the general form:

  def dfunc(p, centers, *dfargs, **dfkwargs):
      ...
      return d

where p is the progress coordinates of a single segment at one time slice of shape (pcoord_ndim,), centers is the full set of centers, dfargs is a tuple or list of positional arguments and dfwargs is a dictionary of keyword arguments. The bin mapper’s assign method then assigns the progress coordinates to the closest bin (minimum distance). It is the responsibility of the user to ensure that the distance is calculated using the appropriate metric.

dfargs is an optional list or tuple of positional arguments to pass into dfunc.

dfkwargs is an optional dict of keyword arguments to pass into dfunc.

Where to Specify

Binning schemes can be defined in the west.cfg file as part of the system section. Below is an example of a 1-D rectilinear binning scheme:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 1
      bin_target_counts: 10
      bins:
        type: RectilinearBinMapper
        boundaries:
          - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'inf']

To define 2-D rectilinear bins, the following can be used:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 2
      bin_target_counts: 10
      bins:
        type: RectilinearBinMapper
        boundaries:
          - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'inf']
          - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'inf']

In order to specify a rectilinear bin mapper, where two bin mappers are used, one "outer" and one "inner" (as described above), the following examples should serve as a good starting point.

To place a rectilinear bin mapper inside of another rectilinear bin mapper:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 1
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 'inf']
        mappers:
          - type: RectilinearBinMapper
            boundaries:
              - [4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9,
                 5, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9,
                 6]
            at: [5]

The RecursiveBinMapper mapper takes two mappers, a base mapper and then another mapper that is placed inside of the base. The base mapper should be, in most cases, rectilinear. The inner mapper, in this case, is rectilinear. Note the at parameter for the mappers option which tells the recursive bin mapper into which base bin to place the mappers mapper. In the above example, the inner mapper is placed in the bin where 5 falls which in the base mapper is the bin that spans from 4 to 6. Multiple inner mappers can be added under the mappers section.

The Minimal Adaptive Binning Scheme

The minimal adaptive binning (MAB) scheme has its own mapper (MABBinMapper) which can be used as an inner mapper in any recursive scheme. Below is an example of how this would look with a 1-D setup:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 1
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
        mappers:
          - type: MABBinMapper
            nbins: [10]
            bottleneck: true
            at: [5]

This will keep MAB binning contained to the outer bin that spans 2 to 6 and will use 10 MAB bins between the extrema walkers in that region. Here is an example with a 2-D scheme:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 2
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
            - [0, 2, 6, 8, 10, 'inf']
        mappers:
          - type: MABBinMapper
            nbins: [10,10]
            bottleneck: true
            at: [5,5]

When defining a target state in terms of your rectilinear-MAB scheme, make sure the target is defined in terms of a fixed outer bin. For example, in the 1-D MAB scheme above, a target state defined as the bin where the point 1.9 falls into would equate to the outer bin from 0 to 2. This will work well since MAB binning is restricted to the outer bin from 2 to 6.

MAB Binning in One Direction

To use MAB binning in only one direction, add the direction keyword to your MABBinMapper block:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 1
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
        mappers:
          - type: MABBinMapper
            nbins: [10]
            direction: [-1]
            bottleneck: true
            at: [5]

A direction of [1] directs MAB to drive binning to higher values along a progress coordinate and a direction of [-1] directs MAB to drive binning to lower values. Basically, the direction will create a separate bin for the leading or lagging walker with direction 1 and -1, respectively. If a two-dimensional progress coordinate is being used, a direction may be specified for both dimensions. For example, [-1,-1] will drive MAB binning to lower values in both dimensions simultaneously and [0,1] will drive MAB binning in both directions for the first dimensions and only to higher values in the second dimension. A direction of [0] indicates both the leading and lagging walkers will be split and is the default if no direction is specified in west.cfg. If you prefer a 'lite' version of MAB where neither the leading or lagging walkers are split, you can use a direction of [86]. This can be useful to avoid MAB getting stuck in dead ends where you split until you reach extremely small weights.

How to "Skip" Splitting in a MAB Binning Dimension

This feature should be fixed in the most recent release (v2022.05+).

In cases where a user would like to include a value as part of the progress coordinate but not bin along that dimension (which happens when a certain value is only needed for the target state definition), the optional parameter skip can be added to the MABBinMapper section.

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 2
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
            - ['-inf', 'inf']
        mappers:
          - type: MABBinMapper
            nbins: [10, 1]
            direction: [-1, 0]
            skip: [0, 1]
            bottleneck: true
            at: [5, 1]

The skip option is a list with each value corresponding to a dimension of the pcoord. A value of 0 means that normal MAB binning and splitting (including bottleneck splitting) is done while a value of 1 means this dimension will be "skipped". The number of bins and direction of a skipped dimension don't matter; they will not be reached in the core code when skipping is requested.

How to Print MAB Info in the west.log File

If you would like to see, in real time, statistics related to your MAB binning (such as the progress coordinate of extrema walkers, directions etc.) you can turn on MAB logging by enabling the mab_log parameter. Below is an example of how to enable MAB logging.

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 2
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
            - ['-inf', 'inf']
        mappers:
          - type: MABBinMapper
            nbins: [10, 1]
            direction: [-1, 0]
            skip: [0, 1]
            bottleneck: true
            mab_log: true
            at: [5, 1]

How to Print MAB Bins into binbounds.log File

If you would like to export the bin boundaries of each MAB for later use, you can turn on MAB logging by enabling the bin_log parameter. Below is an example of how to enable MAB logging. This is a function that only exists in v2022.06 or later.

In v2022.09 or later, you can specify the file name using bin_log_path.

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 2
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
        mappers:
          - type: MABBinMapper
            nbins: [5]
            direction: [0]
            bin_log: true
            bin_log_path: $WEST_SIM_ROOT/binbounds.log
            at: [5]

The following is an example output. The comments are for clarity and are not part of the output.

    iteration: 1
    bin boundaries: [ 6.64947462  9.30917244 11.96887026 14.62856808 17.2882659  19.94796371] 
    min/max pcoord: [6.6494746] [19.947964]  
    bottleneck bins: 0
    bottleneck pcoord: [None] [None]

    iteration: 2  # Iteration Number
    bin boundaries: [ 6.73101187  9.44333153 12.15565119 14.86797085 17.58029051 20.29261017]  # Bin boundaries. Additional lists are printed depending on the shape of your pcoord (e.g., 2D progress coordinate).
    min/max pcoord: [6.731012] [20.29261]  # First list contains the minimum progress coordinate(s), the second list contains the max progress coordinate(s).
    bottleneck bins: 2  # Number of bottleneck bins filled.
    bottleneck pcoord: [6.833282] [19.568398]  # Each list shows the bottleneck walker for either direction. None implies no bottleneck walker was chosen for that specific (direction, dimension) combination.

Binless Schemes

WESTPA 2.0 introduces a new BinlessMapper which enables the use of custom "binless" grouping schemes for splitting and merging walkers. Below is an example of how this would look with a 1-D setup:

  system:
    system_options:
      pcoord_dtype: !!python/name:numpy.float32 ''
      pcoord_len: 21
      pcoord_ndim: 1
      bin_target_counts: 10
      bins:
        type: RecursiveBinMapper
        base:
          type: RectilinearBinMapper
          boundaries:
            - [0, 2, 6, 8, 10, 'inf']
        mappers:
          - type: BinlessMapper
            ngroups: 5
            ndims: 1
            group_function: group.kmeans
            at: [5]

This will employ a binless scheme between the outer bin that spans from 2 to 6 and will group according to the function defined in group.py. An example grouping function is shown below.

  log = logging.getLogger(__name__)
  log.debug('loading module %r' % __name__)
  
  def kmeans(coords, n_clusters, splitting, **kwargs):
      X = numpy.array(coords)
      if X.shape[0] == 1:
          X = X.reshape(-1,1)
      km = cluster.KMeans(n_clusters=n_clusters).fit(X)   
      cluster_centers_indices = km.cluster_centers_
      labels = km.labels_
      if splitting:
          print("cluster centers:", numpy.sort(cluster_centers_indices))
      return labels

The example grouping function will group and split/merge walkers using a kmeans algorithm. The current BinlessMapper implementation will only group up to two dimensions, with expanded support coming sometime in the future.