Flag for enabling batch EM
Flag for enabling batch EM
maxIter for batch EM
maxIter for batch EM
Tolerance to stop iterations in batch EM
Tolerance to stop iterations in batch EM
Build the initialMixtureModel column from distribution specific parameters
Build the initialMixtureModel column from distribution specific parameters
Creates a copy of this instance with the same UID and some extra params.
Creates a copy of this instance with the same UID and some extra params.
Decaying stepSize
Decaying stepSize
Param for event time column name, which marks the event time of the received measurements.
Param for event time column name, which marks the event time of the received measurements. If set, the measurements will be processed in ascending order according to event time.
Getter for batch EM flag
Getter for batch EM flag
Getter for batch train max iter parameter
Getter for batch train max iter parameter
Getter for batch train tolerance
Getter for batch train tolerance
Get decay rate as udf
Get decay rate as udf
Getter for event time column parameter
Getter for event time column parameter
Getter for initial covariances parameter
Getter for initial covariances parameter
Getter for initial covariances column
Getter for initial covariances column
Getter for initial means
Getter for initial means
Getter for initial means column
Getter for initial means column
Getter for initial mixture model column
Getter for initial mixture model column
Getter for initialWeights parameter
Getter for initialWeights parameter
Getter for initialWeightsCol parameter
Getter for initialWeightsCol parameter
Minibatch param getter
Minibatch param getter
MinibatchSizeCol param getter
MinibatchSizeCol param getter
Getter for sample column.
Getter for sample column.
Getter for state key column name parameter
Getter for state key column name parameter
Getter for state key column
Getter for state key column
Getter for state timeout duration parameter
Getter for state timeout duration parameter
Getter for stepSize parameter
Getter for stepSize parameter
Getter for stepSizeCol parameter
Getter for stepSizeCol parameter
Getter for timeout mode
Getter for timeout mode
TimeoutMode
Getter for update holdout param
Getter for update holdout param
Getter for update holdout column param
Getter for update holdout column param
Getter for watermark duration parameter
Getter for watermark duration parameter
Initial covariances nested array, column major and mixtureCount x sampleSize**2
Initial covariances nested array, column major and mixtureCount x sampleSize**2
Initial covariances from dataframe column
Initial covariances from dataframe column
Initial means of the mixtures
Initial means of the mixtures
Initial means from dataframe column
Initial means from dataframe column
Initial mixture model as a struct column
Initial mixture model as a struct column
Initial weight of the mixtures
Initial weight of the mixtures
Initial weights as dataframe column
Initial weights as dataframe column
Number of samples in a batch
Number of samples in a batch
Number of samples in a batch from dataframe column
Number of samples in a batch from dataframe column
number of mixture components
number of mixture components
Param for sample column.
Param for sample column.
Sets the maximum iterations for batch EM mode
Sets the maximum iterations for batch EM mode
Sets the stopping criteria in terms of loglikelihood improvement for batch EM mode
Sets the stopping criteria in terms of loglikelihood improvement for batch EM mode
Sets the step size as a decaying function rather than a constant step size, which might be preferred for batch training.
Sets the step size as a decaying function rather than a constant step size, which might be preferred for batch training. If set, the step size will be replaced with the output of following function:
stepSize = pow(2 + kIter, -decayRate)
Where kIter is incremented by 1 at each minibatch.
Enables batch EM mode.
Enables batch EM mode. When enabled, transform method will do an iterative EM training with multiple passes as opposed to online training with single pass.
Disabled by default
Sets the event time column in the input DataFrame for event time based state timeout.
Sets the event time column in the input DataFrame for event time based state timeout.
Sets the initial covariance matrices of the mixtures as a nested array of doubles.
Sets the initial covariance matrices of the mixtures as a nested array of doubles. The dimensions of the array should be mixtureCount x sampleSize**2
Sets the initial covariance matrices of the mixtures from dataframe column.
Sets the initial covariance matrices of the mixtures from dataframe column. Overrides the value set by setInitialCovariances
Sets the initial mean vectors of the mixtures as a nested array of doubles.
Sets the initial mean vectors of the mixtures as a nested array of doubles. The dimensions of the array should be mixtureCount x sample vector size
Sets the initial means from dataframe column.
Sets the initial means from dataframe column. Overrides the value set by setInitialMeans
Sets the initial mixture model directly from dataframe column
Sets the initial mixture model directly from dataframe column
Sets the initial weights of the mixtures.
Sets the initial weights of the mixtures. The weights should sum up to 1.0.
Sets the initial weights of the mixtures from dataframe column.
Sets the initial weights of the mixtures from dataframe column. Column should contain array of doubles. Overrides the value set by setInitialWeights.
Sets the minibatch size for batching samples together in online EM algorithm.
Sets the minibatch size for batching samples together in online EM algorithm. Estimate will be produced once per each batch. Having larger batches increases stability with increased memory footprint.
Default is 1
Sets the minibatch size from dataframe column rather than a constant minibatch size across all states.
Sets the minibatch size from dataframe column rather than a constant minibatch size across all states. Overrides setMinibatchSize setting.
Sets the sample column for the mixture model inputs.
Sets the sample column for the mixture model inputs. Depending on the mixture distribution, sample type should be different.
Bernoulli => Boolean Poisson => Long MultivariateGaussian => Vector
Sets the state key column.
Sets the state key column. Each value in the column should uniquely identify a stateful transformer. Each unique value will result in a separate state.
Sets the state timeout duration for all states, only valid when state timeout mode is not 'none'.
Sets the state timeout duration for all states, only valid when state timeout mode is not 'none'. Must be a valid duration string, such as '10 minutes'.
Sets the state timeout mode.
Sets the state timeout mode. Supported values are 'none', 'process' and 'event'. Enabling state timeout will clear the state after a certain timeout duration which can be set. If a state receives measurements after it times out, the state will be initialized as if it received no measurements.
- 'none': No state timeout, state is kept indefinitely.
- 'process': Process time based state timeout, state will be cleared if no measurements are received for a duration based on processing time. Effects all states. Timeout duration must be set with setStateTimeoutDuration.
- 'event': Event time based state timeout, state will be cleared if no measurements are recieved for a duration based on event time determined by watermark. Effects all states. Timeout duration must be set with setStateTimeoutDuration. Additionally, event time column and it's watermark duration must be set with setEventTimeCol and setWatermarkDuration. Note that this will result in dropping measurements occuring later than the watermark.
Default is 'none'
Sets the step size parameter, which weights the current parameter of the model against the old parameter.
Sets the step size parameter, which weights the current parameter of the model against the old parameter. A step size of 1.0 means ignore the old parameter, whereas a step size of 0 means ignore the current parameter. Values closer to 1.0 will increase speed of convergence, but might have adverse effects on stability. In an online setting, it is advised to set it close to 0.0.
Default is 0.1
Sets the step size from dataframe column, which would allow setting different step sizes accross measurements.
Sets the step size from dataframe column, which would allow setting different step sizes accross measurements. Overrides the value set by setStepSize.
Sets the update holdout parameter which controls after how many samples the mixture will start calculating estimates.
Sets the update holdout parameter which controls after how many samples the mixture will start calculating estimates. Preventing update in first few samples might be preferred for stability.
Sets the update holdout parameter from dataframe column rather than a constant value across all states.
Sets the update holdout parameter from dataframe column rather than a constant value across all states. Overrides the value set by setUpdateHoldout.
Set the watermark duration for all states, only valid when state timeout mode is 'event'.
Set the watermark duration for all states, only valid when state timeout mode is 'event'. Must be a valid duration string, such as '10 minutes'.
Param for state key column.
Param for state key column. State keys uniquely identify the each state in stateful transformers, thus controlling the number of states and the degree of parallelization"
Param for state timeout duration.
Param for state timeout duration.
Controls the inertia of the current parameter.
Controls the inertia of the current parameter.
stepSize as dataframe column
stepSize as dataframe column
Param for timeout mode, controlling the eviction of states which receive no measurement for a certain duration
Param for timeout mode, controlling the eviction of states which receive no measurement for a certain duration
Transforms the dataframe of samples to a dataframe of mixture parameter estimates.
Transforms the dataframe of samples to a dataframe of mixture parameter estimates.
Applies the transformation to dataset schema
Applies the transformation to dataset schema
Update holdout parameter
Update holdout parameter
Update holdout parameter from dataframe column
Update holdout parameter from dataframe column
Param for watermark duration as string, measured from the eventTimeCol column.
Param for watermark duration as string, measured from the eventTimeCol column. If set, measurements will be processed in append mode with the specified watermark duration.
Online multivariate gaussian mixture estimation with a stateful transformer, based on Cappe(2011) Online Expectation-Maximisation paper.
Outputs an estimate for each input sample in a single pass, by replacing the E-step in EM with a stochastic E-step.