2.4.4. Miscellaneous Functions
2.4.4.1. Adaptive Partitioning
Methods of partitioning birth-lifetime plane for persistence diagrams. This is used for the adaptive partitioning version of template function featurization.
- class teaspoon.SP.adaptivePart.Partitions(data=None, convertToOrd=False, meshingScheme=None, partitionParams={}, **kwargs)[source]
A data structure for storing a partition coming from an adapative meshing scheme.
- Parameters:
data (np.array) – A numpy array of type many by 2
convertToOrd (bool) – Boolean variable to decide if you want to use ordinals for partitioning. Ordinals make things faster but not as nice partitions.
meshingScheme (str) –
The type of meshing scheme you want to use. Options include:
’DV’ method is based on (mention paper here). For more details see function return_partition_DV.
’clustering’ uses a clustering algorithm to find clusters in the data, then takes the bounding box of all points assigned to each cluster. For more details see the function return_partition_clustering.
partitionParams (dict) –
Dictionary of parameters needed for the particular meshing scheme selected. For the explanation of the parameters see the function for the specific meshingScheme (i.e. return_partition_DV or return_partition_clustering)
For ‘DV’ the adjustable parameters are ‘alpha’, ‘c’, ‘nmin’, ‘numParts’, ‘split’.
For ‘clustering’ the adjustable parameters are ‘numClusters’, ‘clusterAlg’, ‘weights’, ‘boxOption’, ‘boxWidth’, ‘split’.
kwargs – Any leftover inputs are stored as attributes.
- convertOrdToFloat(partitionEntry)[source]
Converts to nodes of a partition entry from ordinal back to floats.
- Parameters:
partitionEntry (dict) – The partition that you want to convert.
- Returns:
Partition entry with converted nodes. Also sets dictionary element to the converted version.
- getOrdinal(key)[source]
Overrides the builtin magic method in the case where you had non-ordinal data but still want the ordinal stuff back. If the data wasn’t ordinal, this has the exact same effect as self[key].
- isOrdinal(dd)[source]
Helper function for error checking. Used to make sure input is in ordinal coordinates. It checks that when the two data columns are sorted they are each equal to an ordered vector with the same number of rows.
- Parameters:
dd – Data in a manyx2 numpy array
- iterOrdinal()[source]
Functions just like iter magic method without converting each entry back to its float
- plot()[source]
Plot the partitions. Can plot in ordinal or float, whichever is in the partition bucket when it’s called.
- return_partition_DV(data, borders, r=2, alpha=0.05, c=0, nmin=5)[source]
Recursive method that partitions the data based on the DV method.
- Parameters:
data (np.array) – A manyx2 numpy array that contains all the data in ordinal format.
borders (dict) – A dictionary that contains ‘nodes’ with a numpy array of Xmin, Xmax, Ymin, Ymax.
r (int) – The number of partitions to split in each direction (i.e. r=2 means each partition is recursively split into a 2 by 2 grid of partitions)
alpha (float) – The required significance level for independence to stop partitioning
c (int) – Parameter for an exit criteria. Partitioning stops if min(width of partition, height of partition) < max(width of bounding box, height of bounding box)/c.
nmin (int) – Minimum average number of points in each partition to keep recursion going. The default is 5 because chisquare test breaks down with less than 5 points per partition, thus we recommend choosing nmin >= 5.
- Returns:
List of dictionaries. Each dictionary corresponds to a partition and contains ‘nodes’, a numpy array of Xmin, Xmax, Ymin, Ymax of the partition, and ‘npts’, the number of points in the partition.
- return_partition_clustering(data, clusterAlg=<class 'sklearn.cluster._kmeans.KMeans'>, num_clusters=5, weights=None, boxOption='boundPoints', boxSize=2)[source]
Partitioning method based on clustering algorithms. First cluster the data, then using the cluster centers and labels determine the partitions.
- Parameters:
data (np.array) – A manyx2 numpy array that contains all the original data (not ordinals).
cluster_algorithm (function) – Clustering algorithm you want to use. Only options right now are KMeans and MiniBatchKMeans from scikit learn.
num_clusters (int) – The number of clusters you want. This is the number of partitions you want to divide your space into.
weights (np.array) – An array of the same length as data containing weights of points to use weighted clustering
boxOption (str) – Specifies how to choose the boxes based on cluster centers. Only option right now is “boundPoints” which takes the bounding box of all data points assigned to that cluster center. Additional options may be added in the future.
boxSize (int) – This input is not used as of now.
- Returns:
List of dictionaries. Each dictionary corresponds to a partition and contains ‘nodes’, a numpy array of Xmin, Xmax, Ymin, Ymax of the partition, and ‘center’, the center of the cluster for that partition.