sparsifier#

Provides all the utilities needed for generating prediction_tasks via simulation.

class ConstantSuffixRemover(n: int = 10, epsilon: float = 0.001, mode: str = 'relative')#

Bases: Sparsifier

A sparsifier which removes constant suffixes.

frac#

The relative amount of samples to keep from the incoming signal.

0 <= frac <= 1.
Raises:
  • TypeError – If frac is not a float.

  • ValueError – If frac not in the interval [0, 1].

Inits the ConstantSuffixRemover.

Parameters:
  • n – The minimum length of the suffix to remove.

  • epsilon – Relative size of the epsilon-neighbourhood of the suffix.

  • mode – Wether the espilon neighborhood is “relative” or “absolute”.

Raises:
  • TypeError – If n is not an int.

  • ValueError – If n is negative or epsilon is negative.

  • TypeError – If epsilon is neither float nor int.

  • ValueError – If mode is neither “absolute” nor “relative”.

sparsify(signal: DataFrame) DataFrame#

Sparsifies the signal by constant suffixes.

Parameters:

signal – The signal to sparsify.

Returns:

The sparsified signal.

class IntervalSparsifier(*kinetic_parameters: tuple[simba_ml.simulation.sparsifier.sparsifier.Sparsifier, Union[int, str]])#

Bases: Sparsifier

A Sparsifier that sparsifies intervals with different Sparsifier.

The IntervalSparsifier takes sparsifiers and interval endings as arguments. The sparsifiers are applied to the according intervals.

Inits the IntervalSparsifier.

Parameters:

*kinetic_parameters – Pairs of (sparsifier, end_of_interval) where the end_of_interval is the last timestep, where the sparsifier should be applied. end_of_interval can either be represented explicit as an integer or relative to the length of the signal as a float.

Raises:
  • ValueError – If interval endings are neither ints nor floats in range [0, 1]

  • TypeError – If Sparsifiers are not of type Sparsifier.

Examples

>>> import pandas as pd
>>> from simba_ml.simulation import sparsifier
>>> signal = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
>>> sparsifier.interval_sparsifier.IntervalSparsifier(
...     (sparsifier.random_sample_sparsifier.RandomSampleSparsifier(0), 2),
...     (sparsifier.random_sample_sparsifier.RandomSampleSparsifier(1), 7),
...     (sparsifier.random_sample_sparsifier.RandomSampleSparsifier(0), 11),
... ).sparsify(signal).sort_index()
   a
2  3
3  4
4  5
5  6
6  7
>>> sparsifier.interval_sparsifier.IntervalSparsifier(
...     (sparsifier.random_sample_sparsifier.RandomSampleSparsifier(0),
...         0.2),
...     (sparsifier.random_sample_sparsifier.RandomSampleSparsifier(1),
...         0.5),
...     (sparsifier.random_sample_sparsifier.RandomSampleSparsifier(0),
...         1.0),
... ).sparsify(signal).sort_index()
   a
2  3
3  4
4  5
sparsify(signal: DataFrame) DataFrame#

Removes some (1-frac) samples chosen with a uniform random distributions.

Parameters:

signal – The signal to sparsify.

Returns:

The sparsified signal.

Return type:

DataFrame

class KeepExtremeValuesSparsifier(sparsifier: Sparsifier, lower_bound: float = 0.1, upper_bound: float = 0.1)#

Bases: Sparsifier

A Sparsifier that keeps extreme values.

Inits the KeepExtremeValuesSparsifier.

Parameters:
  • sparsifier – The sparsifier to apply to the signal.

  • lower_bound – The fraction of timestamps to keep because the values is in the lower bound.

  • upper_bound – The fraction of timestamps to keep because the values is in the upper bound.

Raises:
  • ValueError – lower_bound or upper_bound is not in range [0, 1] or lower_bound > upper_bound.

  • TypeError – lower_bound or upper_bound is not a float.

Examples

>>> import pandas as pd
>>> from simba_ml.simulation import sparsifier
>>> signal = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
>>> sparsifier.keep_extreme_values_sparsifier.KeepExtremeValuesSparsifier(
...     sparsifier.random_sample_sparsifier.RandomSampleSparsifier(0)
... ).sparsify(signal).sort_index()
    a
0   1
9  10
>>> signal = pd.DataFrame({
...     "a": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
...     "b": [1, 3, 5, 7, 9, 10, 8, 6, 5, 2]})
>>> sparsifier.keep_extreme_values_sparsifier.KeepExtremeValuesSparsifier(
...     sparsifier.random_sample_sparsifier.RandomSampleSparsifier(0),
... ).sparsify(signal).sort_index()
    a   b
0   1   1
5   6  10
9  10   2
sparsify(signal: DataFrame) DataFrame#

Removes some (1-frac) samples chosen with a uniform random distributions.

Parameters:

signal – The signal to sparsify.

Returns:

The sparsified signal.

Return type:

DataFrame

class NoSparsifier#

Bases: Sparsifier

A dummy sparsifier that just returns the incoming signal.

sparsify(signal: DataFrame) DataFrame#

Mocks to sparfify, but does not remove any sample.

Parameters:

signal – The signal to sparsify.

Returns:

The (not) sparsified signal.

Return type:

pd.DataFrame

class RandomSampleSparsifier(frac: Union[float, int] = 0.5)#

Bases: Sparsifier

Removes some relative amount of the given samples.

frac#

The relative amount of samples to keep from the incoming signal. 0 <= frac <= 1.

Raises:
  • TypeError – If frac is not a float.

  • ValueError – If frac not in the interval [0, 1].

Inits the RandomSampleSparsiier.

Parameters:

frac – The relative amount of samples to keep from the incoming signal.

sparsify(signal: DataFrame) DataFrame#

Removes some (1-frac) samples chosen with a uniform random distributions.

Parameters:

signal – The signal to sparsify.

Returns:

The sparsified signal.

Return type:

DataFrame

class SequentialSparsifier(sparsifiers: list[simba_ml.simulation.sparsifier.sparsifier.Sparsifier])#

Bases: Sparsifier

The SequentialNoiser applies multiple given Noiser sequentially.

noisers#

A list of Noiser to be applied.

Inits SequentialNoiser with the provided params.

Parameters:

sparsifiers – A list of Sparsifiers to be applied.

sparsify(signal: DataFrame) DataFrame#

Sparsifies to the provided signal.

Parameters:

signal – The input data.

Returns:

The noised signal.

class Sparsifier#

Bases: ABC

A sparsifier sparsifies an input signal by removing samples.

abstract sparsify(signal: DataFrame) DataFrame#

Removes half of the samples chosen with a uniform random distributions.

Parameters:

signal – The signal to sparsify.

simba_ml.simulation.sparsifier.constant_suffix_remover

Provides a Sparsifier which removes constant suffixes.

simba_ml.simulation.sparsifier.interval_sparsifier

Removes a given relative amount of samples from a signal.

simba_ml.simulation.sparsifier.keep_extreme_values_sparsifier

Removes a given relative amount of samples from a signal.

simba_ml.simulation.sparsifier.no_sparsifier

Provides a Dummy-Sparsifier which removes no samples from a signal.

simba_ml.simulation.sparsifier.random_sample_sparsifier

Removes a given relative amount of samples from a signal.

simba_ml.simulation.sparsifier.sequential_sparsifier

Module providing the SequentialNoiser.

simba_ml.simulation.sparsifier.sparsifier

Provides an abstract Sparsifier.