Background - UTNuclearRobotics/utexas_sterling GitHub Wiki

Convolution

  • Filter (Kernel): A small matrix of weights that is applied to the input data. Common sizes are 3x3, 5x5, etc.
  • Stride: The number of pixels by which the filter moves across the input data. A stride of 1 means the filter moves one pixel at a time, while a stride of 2 means it moves two pixels at a time.
  • Padding: Adding extra pixels around the input data to control the spatial dimensions of the output feature map. Common padding types are 'valid' (no padding) and 'same' (padding to keep the output size the same as the input size).
  • Feature Map: The output of the convolution operation, which highlights specific patterns detected by the filter.

Example

Input Image:

1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0

Filter (Kernel):

1 0 1
0 1 0
1 0 1

Convolution Operation:

1*1 + 0*1 + 1*1 + 0*1 + 1*1 + 0*1 + 1*0 + 0*1 + 1*0 = 3

Silhouette Score

The silhouette score is a metric used to evaluate the quality of clusters created by a clustering algorithm. It measures how similar an object is to its own cluster compared to other clusters. The overall silhouette score for a dataset is the mean silhouette score for all samples.

The silhouette score ranges from -1 to 1, where:

  • 1 indicates that the object is well-matched to its own cluster and poorly matched to neighboring clusters.
  • 0 indicates that the object is on or very close to the decision boundary between two neighboring clusters.
  • -1 indicates that the object is poorly matched to its own cluster and well-matched to a neighboring cluster.

Relevance

The silhouette score is significant because it provides an intuitive measure of how well-defined the clusters are. It helps in the following ways:

  • Cluster Validation: It helps in validating the consistency within clusters of data. A high silhouette score indicates that the clusters are well-separated and distinct.
  • Model Selection: It aids in selecting the optimal number of clusters (k) in clustering algorithms like k-means. By comparing silhouette scores for different values of k, one can choose the value that maximizes the silhouette score, indicating the best clustering structure.
  • Performance Comparison: It allows for the comparison of different clustering algorithms or different parameter settings for the same algorithm.

Context

The silhouette score is used to determine the best k-means clustering model. The function iterates over a range of cluster numbers (k) and computes the silhouette score for each. The model with the highest silhouette score is selected as the best model, indicating the most well-defined clustering structure for the given data.