utensil.loopflow.functions package

Submodules

utensil.loopflow.functions.basic module

Provide NodeProcessFunction for basic usage.

Example:

from utensil.loopflow.functions import basic
from utensil.loopflow.loopflow import register_node_process_functions
register_node_process_functions(basic)
utensil.loopflow.functions.basic.MISSING = MISSING

Missing token.

Used to indicate a missing value.

class utensil.loopflow.functions.basic.Dummy[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Identical function.

Returns whatever it get.

>>> Dummy().main('anything')
'anything'
main(a: Any = MISSING)[source]
class utensil.loopflow.functions.basic.Default(default)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Implements a default behavior.

Return a default value if triggered before getting anything.

default

the default value.

>>> default = Default('my_default')

This will return the input. >>> default.main(‘my_input’) ‘my_input’

This will return the default value. >>> default.main() ‘my_default’

main(o: Any = MISSING)[source]
class utensil.loopflow.functions.basic.Add(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Add a predefined constant, i.e., n+a.

a

the constant value to be added.

>>> p = Add(3)
>>> p.main(5)
8
>>> p.main(9)
12
main(n)[source]
Parameters

n – value to be added with a.

Returns

n+a.

namedtuple utensil.loopflow.functions.basic.ConditionValue(c, v)

Bases: namedtuple()

A pair of a boolean and a value for flow control.

c

a boolean value indicating if condition is passed.

v

the value to be used.

ConditionValue(c, v)

Fields
  1.  c – Alias for field number 0

  2.  v – Alias for field number 1

class utensil.loopflow.functions.basic.LessEqual(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than or equal to a constant, i.e., b <= a.

a

the constant value to be compared with.

>>> LessEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> LessEqual(5).main(10)
ConditionValue(c=False, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b <= a, and v is b.

class utensil.loopflow.functions.basic.Equal(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is equal to a constant, i.e., b == a.

a

the constant value to be compared with.

>>> Equal(3).main(3)
ConditionValue(c=True, v=3)
>>> Equal(5).main(10)
ConditionValue(c=False, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b == a, and v is b.

class utensil.loopflow.functions.basic.GreaterEqual(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than or equal to a constant, i.e., b >= a.

a

the constant value to be compared with.

>>> GreaterEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> GreaterEqual(15).main(10)
ConditionValue(c=False, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b >= a, and v is b.

class utensil.loopflow.functions.basic.LessThan(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than a constant, i.e., b < a.

a

the constant value to be compared with.

>>> LessThan(3).main(3)
ConditionValue(c=False, v=3)
>>> LessThan(15).main(10)
ConditionValue(c=True, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b < a, and v is b.

class utensil.loopflow.functions.basic.GreaterThan(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than a constant, i.e., b > a.

a

the constant value to be compared with.

>>> GreaterThan(3).main(3)
ConditionValue(c=False, v=3)
>>> GreaterThan(5).main(10)
ConditionValue(c=True, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b > a, and v is b.

utensil.loopflow.functions.dataflow module

Provide NodeProcessFunction for machine learning work flows.

Example:

from utensil.loopflow.functions import dataflow
from utensil.loopflow.loopflow import register_node_process_functions
register_node_process_functions(dataflow)
class utensil.loopflow.functions.dataflow.Feature(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]

Bases: pandas.core.series.Series

A feature of a dataset.

Feature is an individual measurable property or characteristic of a phenomenon. It can be a list of numbers, strings with or without missing values. The length of a feature (missing values included) should be the number of instance in a dataset.

class utensil.loopflow.functions.dataflow.Features(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: pandas.core.frame.DataFrame

A list of features.

Features is a list of Feature. It can be represented as a matrix of numbers, strings, missing values, etc.

class utensil.loopflow.functions.dataflow.Target(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]

Bases: pandas.core.series.Series

The target of a dataset.

Target is whatever the output of the input variables. Typically, it is the variables a supervised model trying to learn to predict, either numerical or categorical.

class utensil.loopflow.functions.dataflow.Dataset(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features)[source]

Bases: object

A dataset used to train a model or to let a model predict its target.

A pair of Target and Features. For supervised case, to train or to score a model, use both of target and features; to predict only, use only the features. The length of target should be identical to the length of every feature of features, i.e., the number of instances.

>>> dataset = Dataset(
...     Target(np.random.randint(2, size=3)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> dataset.nrows
3
>>> dataset.ncols
4
>>> bad_dataset = Dataset(
...     Target(np.random.randint(2, size=2)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> bad_dataset.nrows
Traceback (most recent call last):
...
ValueError: rows of target and that of features should be the same
target: utensil.loopflow.functions.dataflow.Target

The target of the dataset.

features: utensil.loopflow.functions.dataflow.Features

The features of the dataset.

property nrows

Number of rows/instances.

property ncols

Number of columns/features.

class utensil.loopflow.functions.dataflow.Model[source]

Bases: object

A base model class to be trained and to predict target based on a dataset.

Before calling Model.train(), the model is untrained and should not be used to predict. After that, Model.predict() can be called to predict the Target of Features.

train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model[source]

Train a model.

Use self as a base model to train on dataset for a trained model.

Should be overridden by subclass for implementation. >>> Model().train(Dataset( … Target(np.random.randint(2, size=3)), … Features(np.random.random(size=(3, 4))) … )) Traceback (most recent call last):

NotImplementedError

Parameters

dataset (Dataset) – dataset to be trained on.

Returns

A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target[source]

Predict the target.

Model returned from Model.train() can predict for Target on a given Features.

Should be overridden by subclass for implementation. >>> Model().predict(Features(np.random.random(size=(3, 4)))) Traceback (most recent call last):

NotImplementedError

Parameters

features (Features) – used to predicted Target.

Returns

The prediction of Target.

class utensil.loopflow.functions.dataflow.SklearnModel(model)[source]

Bases: utensil.loopflow.functions.dataflow.Model

A wrapper for sklearn models.

>>> from sklearn.linear_model import LinearRegression
>>> model = SklearnModel(LinearRegression())
>>> target = Target([1, 2, 3])
>>> features = Features([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> model = model.train(Dataset(target, features))
>>> model.predict(features + 1)
0    1.25
1    2.25
2    3.25
dtype: float64
train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model[source]

Train a model.

Use self as a base model to train on dataset for a trained model. Typically the fit method of the sklearn model is used.

Parameters

dataset (Dataset) – dataset to be trained on.

Returns

A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target[source]

Predict the target.

Model returned from Model.train() can predict for Target on a given Features. Typically the predict method of the sklearn model is used.

Parameters

features (Features) – used to predicted Target.

Returns

The prediction of Target.

class utensil.loopflow.functions.dataflow.LoadData(dformat: str, url: str, target: str, features: Dict[int, str])[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Load a dataset from an URL.

URL can be a path. Data format can be SVMLIGHT.

dformat

Data format. Valid options are SVMLIGHT.

Todo

More format are needed

  1. CSV

  2. HDF5

Type

str

url

URL for the dataset. Should be a path or a url with the scheme http, https or file.

Todo

More types are needed

  1. sklearn data.

Type

str

target

The column of the dataset treated as a target.

Type

str

features

A mapping from 0-index of column to its name. This is useful when the dataset itself does not contain its own column names, for example, svmlight format.

Type

dict[int, str]

main() utensil.loopflow.functions.dataflow.Dataset[source]
class utensil.loopflow.functions.dataflow.FilterRows(filter_by: Dict[str, Any])[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Filter rows of Dataset.

Filter rows of dataset by the value of its Target.

filter_by

Indicate to filter by which column with what values. Typical usage is to filter TARGET with a list of values. For example, filter_by={"TARGET": [1, 2]} filters the target column to only contains 1 or 2.

Type

dict[str, Any]

main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Dataset[source]
Parameters

dataset (Dataset) – the dataset to be filtered.

Returns

A filtered dataset.

class utensil.loopflow.functions.dataflow.SamplingRows(number: int = MISSING, ratio: float = MISSING, stratified: bool = False, replace: bool = False, random_seed: Union[None, int] = None, return_rest: bool = False)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Sampling rows of a dataset.

This method samples a dataset to a specific number of rows or to a ratio.

number

Sampled dataset will have this many rows. Suppressed by ratio.

Type

int

ratio

Sampled dataset will have ratio * dataset.nrows rows. Suppressing number.

Type

float, default 1.0 if number is not set

stratified

If True, the dataset will be sampled using a stratified manner. That is, there will be same number of rows for each category of the dataset target, if possible.

Type

bool, default False

replace

If True, the dataset will be sampled with replacement and a row may be selected multiple times. Will raise an exception if replace is set to False and number larger than dataset.nrows or ratio larger than 1.

Type

bool, default False

random_seed

Random seed used to sample the dataset. It is used to set numpy.random.BitGenerator. See Numpy Documentation for more information.

Type

None or int, default None

return_rest

If False, only the sampled dataset is returned.

If True, this method will return a dictionary of two datasets,

{
    'sampled': sampled_dataset,
    'rest': rest_dataset,
}

rest_dataset contains all rows not in sampled_dataset.

Note

Even if sampled_dataset is sampled with replacement, rest_dataset does not contain duplicated rows.

Type

bool, default False

main(dataset: utensil.loopflow.functions.dataflow.Dataset) Union[utensil.loopflow.functions.dataflow.Dataset, Dict[str, utensil.loopflow.functions.dataflow.Dataset]][source]
Parameters

dataset (Dataset) – the dataset to be sampled.

Returns

A sampled dataset or a dictionary of the sampled dataset and the rest dataset.

class utensil.loopflow.functions.dataflow.MakeDataset[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make a dataset using target and features.

main(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Dataset[source]
Parameters
  • target (Target) – the input target.

  • features (Features) – the input features.

Returns

A dataset consisted of target and features.

class utensil.loopflow.functions.dataflow.GetTarget[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get target from a dataset.

main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Target[source]
Parameters

dataset (Dataset) – get target from this dataset.

Returns

The target of the dataset.

class utensil.loopflow.functions.dataflow.GetFeature(feature: str)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get feature from a dataset with a given name.

feature

This feature will be retrieved from the dataset.

Type

str

main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Feature[source]
Parameters

dataset (Dataset) – get feature from this dataset.

Returns

The feature with the given name of the dataset.

class utensil.loopflow.functions.dataflow.MergeFeatures[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Merge a list of feature to features.

main(*features: utensil.loopflow.functions.dataflow.Feature) utensil.loopflow.functions.dataflow.Features[source]
Parameters

*features (list of Feature) – list of feature to be merged.

Returns

A Features object contains the list features.

class utensil.loopflow.functions.dataflow.LinearNormalize(upper: Optional[Dict[str, Any]] = None, lower: Optional[Dict[str, Any]] = None)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Perform linear normalization of a 1d array.

Linearly maps the given array from range (u1, l1) to (u2, l2).

upper

Sets u1=upper["FROM"] and u2=upper["TO"]. u* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type

dict of FROM and TO, default both MAX

lower

Sets l1=lower["FROM"] and l2=lower["TO"]. l* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type

dict of FROM and TO, default both MIN

main(arr1d: numpy.ndarray) numpy.ndarray[source]
class utensil.loopflow.functions.dataflow.MakeModel(method)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make an untrained model.

method

the model will use this method to train. Options are XGBOOST_REGRESSOR, XGBOOST_CLASSIFIER.

Type

str

main(model_params: Dict[str, Any]) utensil.loopflow.functions.dataflow.Model[source]
Parameters

model_params (dict) –

The parameters to create the model. Based on the method, different parameters can be set.

Returns

An untrained Model.

class utensil.loopflow.functions.dataflow.Train[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Train a model.

main(model: utensil.loopflow.functions.dataflow.Model, dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model[source]
Parameters
  • model (Model) – The model to be trained.

  • dataset (Dataset) – The dataset to be trained on.

Returns

A trained Model.

class utensil.loopflow.functions.dataflow.Predict[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Predict a target.

main(model: utensil.loopflow.functions.dataflow.Model, features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target[source]
Parameters
  • model (Model) – The prediction is from this model.

  • features (Features) – The features used for prediction.

Returns

A Target based on the model and features. The length of the target is identical to the number of rows of the features.

class utensil.loopflow.functions.dataflow.ParameterSearch(init_state=0, seed: int = 0, search_map: Optional[Dict] = None)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Random search the model parameters.

See more in utensil.random_search.

init_state
Type

int, default 0

seed
Type

int, default 0

search_map
Type

dict, default None

main()[source]
Returns

Next randomly generated parameters.

class utensil.loopflow.functions.dataflow.Score(dataset: str = MISSING, methods: Optional[Union[str, List[str]]] = None)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Calculate scores of a model, based on its prediction and a ground truth.

dataset

The name of the dataset. It is used to generate an informative output.

Type

str

methods

The method or a list of methods to score a model. Options are ACCURACY.

Type

str or list of str

main(prediction: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Features, utensil.loopflow.functions.dataflow.Dataset], ground_truth: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Dataset], model: utensil.loopflow.functions.dataflow.Model)[source]
Parameters
  • prediction (target, features or dataset) –

    If prediction is Target, it will be directly used to calculate the score without using the model.

    If it is Features, model will make a prediction based on it.

    If it is Dataset, model will make a prediction based on its features.

  • ground_truth (target or dataset) –

    If ground_truth is a Target, it is directly compared to prediction.

    If ground_truth is a Dataset, its target is compared to prediction.

  • model (Model) –

    The model to be scored.

    Note

    If prediction is Target, then the model is not used.

Returns

A list of scoring results. A scoring result is consisted of two or three attributes, the scoring method name, the dataset name ( if provided), and the score.

For example:

# if dataset name is 'MNIST'
[
    ('ACCURACY', 'MNIST', 0.812641),
    ('FSCORE', 'MNIST', 0.713278),
]

# if dataset name is not provided
[
    ('ACCURACY', 0.812641),
    ('FSCORE', 0.713278),
]

class utensil.loopflow.functions.dataflow.ChangeTypeTo(to_type: str)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Change the type of a given arr.

to_type

The arr will be this type. Options are INTEGER, FLOAT.

Type

str

main(arr: Union[utensil.loopflow.functions.dataflow.Feature, utensil.loopflow.functions.dataflow.Target])[source]
Parameters

arr (Feature or Target) – The type of this will be changed.

Returns

The arr with type changed to to_type.

Module contents

class utensil.loopflow.functions.Dummy[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Identical function.

Returns whatever it get.

>>> Dummy().main('anything')
'anything'
main(a: Any = MISSING)[source]
class utensil.loopflow.functions.Default(default)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Implements a default behavior.

Return a default value if triggered before getting anything.

default

the default value.

>>> default = Default('my_default')

This will return the input. >>> default.main(‘my_input’) ‘my_input’

This will return the default value. >>> default.main() ‘my_default’

main(o: Any = MISSING)[source]
class utensil.loopflow.functions.Add(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Add a predefined constant, i.e., n+a.

a

the constant value to be added.

>>> p = Add(3)
>>> p.main(5)
8
>>> p.main(9)
12
main(n)[source]
Parameters

n – value to be added with a.

Returns

n+a.

class utensil.loopflow.functions.LessEqual(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than or equal to a constant, i.e., b <= a.

a

the constant value to be compared with.

>>> LessEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> LessEqual(5).main(10)
ConditionValue(c=False, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b <= a, and v is b.

class utensil.loopflow.functions.Equal(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is equal to a constant, i.e., b == a.

a

the constant value to be compared with.

>>> Equal(3).main(3)
ConditionValue(c=True, v=3)
>>> Equal(5).main(10)
ConditionValue(c=False, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b == a, and v is b.

class utensil.loopflow.functions.GreaterEqual(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than or equal to a constant, i.e., b >= a.

a

the constant value to be compared with.

>>> GreaterEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> GreaterEqual(15).main(10)
ConditionValue(c=False, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b >= a, and v is b.

class utensil.loopflow.functions.LessThan(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than a constant, i.e., b < a.

a

the constant value to be compared with.

>>> LessThan(3).main(3)
ConditionValue(c=False, v=3)
>>> LessThan(15).main(10)
ConditionValue(c=True, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b < a, and v is b.

class utensil.loopflow.functions.GreaterThan(a)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than a constant, i.e., b > a.

a

the constant value to be compared with.

>>> GreaterThan(3).main(3)
ConditionValue(c=False, v=3)
>>> GreaterThan(5).main(10)
ConditionValue(c=True, v=10)
main(b) utensil.loopflow.functions.basic.ConditionValue[source]
Parameters

b – value to be compared with a.

Returns

a ConditionValue, with c is True if b > a, and v is b.

class utensil.loopflow.functions.Feature(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]

Bases: pandas.core.series.Series

A feature of a dataset.

Feature is an individual measurable property or characteristic of a phenomenon. It can be a list of numbers, strings with or without missing values. The length of a feature (missing values included) should be the number of instance in a dataset.

class utensil.loopflow.functions.Features(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]

Bases: pandas.core.frame.DataFrame

A list of features.

Features is a list of Feature. It can be represented as a matrix of numbers, strings, missing values, etc.

class utensil.loopflow.functions.Target(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]

Bases: pandas.core.series.Series

The target of a dataset.

Target is whatever the output of the input variables. Typically, it is the variables a supervised model trying to learn to predict, either numerical or categorical.

class utensil.loopflow.functions.Dataset(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features)[source]

Bases: object

A dataset used to train a model or to let a model predict its target.

A pair of Target and Features. For supervised case, to train or to score a model, use both of target and features; to predict only, use only the features. The length of target should be identical to the length of every feature of features, i.e., the number of instances.

>>> dataset = Dataset(
...     Target(np.random.randint(2, size=3)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> dataset.nrows
3
>>> dataset.ncols
4
>>> bad_dataset = Dataset(
...     Target(np.random.randint(2, size=2)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> bad_dataset.nrows
Traceback (most recent call last):
...
ValueError: rows of target and that of features should be the same
target: utensil.loopflow.functions.dataflow.Target

The target of the dataset.

features: utensil.loopflow.functions.dataflow.Features

The features of the dataset.

property nrows

Number of rows/instances.

property ncols

Number of columns/features.

class utensil.loopflow.functions.Model[source]

Bases: object

A base model class to be trained and to predict target based on a dataset.

Before calling Model.train(), the model is untrained and should not be used to predict. After that, Model.predict() can be called to predict the Target of Features.

train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model[source]

Train a model.

Use self as a base model to train on dataset for a trained model.

Should be overridden by subclass for implementation. >>> Model().train(Dataset( … Target(np.random.randint(2, size=3)), … Features(np.random.random(size=(3, 4))) … )) Traceback (most recent call last):

NotImplementedError

Parameters

dataset (Dataset) – dataset to be trained on.

Returns

A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target[source]

Predict the target.

Model returned from Model.train() can predict for Target on a given Features.

Should be overridden by subclass for implementation. >>> Model().predict(Features(np.random.random(size=(3, 4)))) Traceback (most recent call last):

NotImplementedError

Parameters

features (Features) – used to predicted Target.

Returns

The prediction of Target.

class utensil.loopflow.functions.SklearnModel(model)[source]

Bases: utensil.loopflow.functions.dataflow.Model

A wrapper for sklearn models.

>>> from sklearn.linear_model import LinearRegression
>>> model = SklearnModel(LinearRegression())
>>> target = Target([1, 2, 3])
>>> features = Features([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> model = model.train(Dataset(target, features))
>>> model.predict(features + 1)
0    1.25
1    2.25
2    3.25
dtype: float64
train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model[source]

Train a model.

Use self as a base model to train on dataset for a trained model. Typically the fit method of the sklearn model is used.

Parameters

dataset (Dataset) – dataset to be trained on.

Returns

A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target[source]

Predict the target.

Model returned from Model.train() can predict for Target on a given Features. Typically the predict method of the sklearn model is used.

Parameters

features (Features) – used to predicted Target.

Returns

The prediction of Target.

class utensil.loopflow.functions.LoadData(dformat: str, url: str, target: str, features: Dict[int, str])[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Load a dataset from an URL.

URL can be a path. Data format can be SVMLIGHT.

dformat

Data format. Valid options are SVMLIGHT.

Todo

More format are needed

  1. CSV

  2. HDF5

Type

str

url

URL for the dataset. Should be a path or a url with the scheme http, https or file.

Todo

More types are needed

  1. sklearn data.

Type

str

target

The column of the dataset treated as a target.

Type

str

features

A mapping from 0-index of column to its name. This is useful when the dataset itself does not contain its own column names, for example, svmlight format.

Type

dict[int, str]

main() utensil.loopflow.functions.dataflow.Dataset[source]
class utensil.loopflow.functions.FilterRows(filter_by: Dict[str, Any])[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Filter rows of Dataset.

Filter rows of dataset by the value of its Target.

filter_by

Indicate to filter by which column with what values. Typical usage is to filter TARGET with a list of values. For example, filter_by={"TARGET": [1, 2]} filters the target column to only contains 1 or 2.

Type

dict[str, Any]

main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Dataset[source]
Parameters

dataset (Dataset) – the dataset to be filtered.

Returns

A filtered dataset.

class utensil.loopflow.functions.SamplingRows(number: int = MISSING, ratio: float = MISSING, stratified: bool = False, replace: bool = False, random_seed: Union[None, int] = None, return_rest: bool = False)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Sampling rows of a dataset.

This method samples a dataset to a specific number of rows or to a ratio.

number

Sampled dataset will have this many rows. Suppressed by ratio.

Type

int

ratio

Sampled dataset will have ratio * dataset.nrows rows. Suppressing number.

Type

float, default 1.0 if number is not set

stratified

If True, the dataset will be sampled using a stratified manner. That is, there will be same number of rows for each category of the dataset target, if possible.

Type

bool, default False

replace

If True, the dataset will be sampled with replacement and a row may be selected multiple times. Will raise an exception if replace is set to False and number larger than dataset.nrows or ratio larger than 1.

Type

bool, default False

random_seed

Random seed used to sample the dataset. It is used to set numpy.random.BitGenerator. See Numpy Documentation for more information.

Type

None or int, default None

return_rest

If False, only the sampled dataset is returned.

If True, this method will return a dictionary of two datasets,

{
    'sampled': sampled_dataset,
    'rest': rest_dataset,
}

rest_dataset contains all rows not in sampled_dataset.

Note

Even if sampled_dataset is sampled with replacement, rest_dataset does not contain duplicated rows.

Type

bool, default False

main(dataset: utensil.loopflow.functions.dataflow.Dataset) Union[utensil.loopflow.functions.dataflow.Dataset, Dict[str, utensil.loopflow.functions.dataflow.Dataset]][source]
Parameters

dataset (Dataset) – the dataset to be sampled.

Returns

A sampled dataset or a dictionary of the sampled dataset and the rest dataset.

class utensil.loopflow.functions.MakeDataset[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make a dataset using target and features.

main(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Dataset[source]
Parameters
  • target (Target) – the input target.

  • features (Features) – the input features.

Returns

A dataset consisted of target and features.

class utensil.loopflow.functions.GetTarget[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get target from a dataset.

main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Target[source]
Parameters

dataset (Dataset) – get target from this dataset.

Returns

The target of the dataset.

class utensil.loopflow.functions.GetFeature(feature: str)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get feature from a dataset with a given name.

feature

This feature will be retrieved from the dataset.

Type

str

main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Feature[source]
Parameters

dataset (Dataset) – get feature from this dataset.

Returns

The feature with the given name of the dataset.

class utensil.loopflow.functions.MergeFeatures[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Merge a list of feature to features.

main(*features: utensil.loopflow.functions.dataflow.Feature) utensil.loopflow.functions.dataflow.Features[source]
Parameters

*features (list of Feature) – list of feature to be merged.

Returns

A Features object contains the list features.

class utensil.loopflow.functions.LinearNormalize(upper: Optional[Dict[str, Any]] = None, lower: Optional[Dict[str, Any]] = None)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Perform linear normalization of a 1d array.

Linearly maps the given array from range (u1, l1) to (u2, l2).

upper

Sets u1=upper["FROM"] and u2=upper["TO"]. u* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type

dict of FROM and TO, default both MAX

lower

Sets l1=lower["FROM"] and l2=lower["TO"]. l* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type

dict of FROM and TO, default both MIN

main(arr1d: numpy.ndarray) numpy.ndarray[source]
class utensil.loopflow.functions.MakeModel(method)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make an untrained model.

method

the model will use this method to train. Options are XGBOOST_REGRESSOR, XGBOOST_CLASSIFIER.

Type

str

main(model_params: Dict[str, Any]) utensil.loopflow.functions.dataflow.Model[source]
Parameters

model_params (dict) –

The parameters to create the model. Based on the method, different parameters can be set.

Returns

An untrained Model.

class utensil.loopflow.functions.Train[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Train a model.

main(model: utensil.loopflow.functions.dataflow.Model, dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model[source]
Parameters
  • model (Model) – The model to be trained.

  • dataset (Dataset) – The dataset to be trained on.

Returns

A trained Model.

class utensil.loopflow.functions.Predict[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Predict a target.

main(model: utensil.loopflow.functions.dataflow.Model, features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target[source]
Parameters
  • model (Model) – The prediction is from this model.

  • features (Features) – The features used for prediction.

Returns

A Target based on the model and features. The length of the target is identical to the number of rows of the features.

class utensil.loopflow.functions.ParameterSearch(init_state=0, seed: int = 0, search_map: Optional[Dict] = None)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Random search the model parameters.

See more in utensil.random_search.

init_state
Type

int, default 0

seed
Type

int, default 0

search_map
Type

dict, default None

main()[source]
Returns

Next randomly generated parameters.

class utensil.loopflow.functions.Score(dataset: str = MISSING, methods: Optional[Union[str, List[str]]] = None)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Calculate scores of a model, based on its prediction and a ground truth.

dataset

The name of the dataset. It is used to generate an informative output.

Type

str

methods

The method or a list of methods to score a model. Options are ACCURACY.

Type

str or list of str

main(prediction: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Features, utensil.loopflow.functions.dataflow.Dataset], ground_truth: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Dataset], model: utensil.loopflow.functions.dataflow.Model)[source]
Parameters
  • prediction (target, features or dataset) –

    If prediction is Target, it will be directly used to calculate the score without using the model.

    If it is Features, model will make a prediction based on it.

    If it is Dataset, model will make a prediction based on its features.

  • ground_truth (target or dataset) –

    If ground_truth is a Target, it is directly compared to prediction.

    If ground_truth is a Dataset, its target is compared to prediction.

  • model (Model) –

    The model to be scored.

    Note

    If prediction is Target, then the model is not used.

Returns

A list of scoring results. A scoring result is consisted of two or three attributes, the scoring method name, the dataset name ( if provided), and the score.

For example:

# if dataset name is 'MNIST'
[
    ('ACCURACY', 'MNIST', 0.812641),
    ('FSCORE', 'MNIST', 0.713278),
]

# if dataset name is not provided
[
    ('ACCURACY', 0.812641),
    ('FSCORE', 0.713278),
]

class utensil.loopflow.functions.ChangeTypeTo(to_type: str)[source]

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Change the type of a given arr.

to_type

The arr will be this type. Options are INTEGER, FLOAT.

Type

str

main(arr: Union[utensil.loopflow.functions.dataflow.Feature, utensil.loopflow.functions.dataflow.Target])[source]
Parameters

arr (Feature or Target) – The type of this will be changed.

Returns

The arr with type changed to to_type.