utensil.loopflow.functions package¶

Submodules¶

utensil.loopflow.functions.basic module¶

Provide NodeProcessFunction for basic usage.

Example:

from utensil.loopflow.functions import basic
from utensil.loopflow.loopflow import register_node_process_functions
register_node_process_functions(basic)

utensil.loopflow.functions.basic.MISSING = MISSING¶

Missing token.

Used to indicate a missing value.

class utensil.loopflow.functions.basic.Dummy[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Identical function.

Returns whatever it get.

>>> Dummy().main('anything')
'anything'

main(a: Any = MISSING)[source]¶

class utensil.loopflow.functions.basic.Default(default)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Implements a default behavior.

Return a default value if triggered before getting anything.

default¶: the default value.

>>> default = Default('my_default')

This will return the input. >>> default.main(‘my_input’) ‘my_input’

This will return the default value. >>> default.main() ‘my_default’

main(o: Any = MISSING)[source]¶

class utensil.loopflow.functions.basic.Add(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Add a predefined constant, i.e., n+a.

a¶: the constant value to be added.

>>> p = Add(3)
>>> p.main(5)
8
>>> p.main(9)
12

main(n)[source]¶

Parameters: n – value to be added with a.
Returns: n+a.

namedtuple utensil.loopflow.functions.basic.ConditionValue(c, v)¶

Bases: namedtuple()

A pair of a boolean and a value for flow control.

c¶: a boolean value indicating if condition is passed.

v¶: the value to be used.

ConditionValue(c, v)

Fields

c – Alias for field number 0
v – Alias for field number 1

class utensil.loopflow.functions.basic.LessEqual(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than or equal to a constant, i.e., b <= a.

a¶: the constant value to be compared with.

>>> LessEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> LessEqual(5).main(10)
ConditionValue(c=False, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b <= a, and v is b.

class utensil.loopflow.functions.basic.Equal(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is equal to a constant, i.e., b == a.

a¶: the constant value to be compared with.

>>> Equal(3).main(3)
ConditionValue(c=True, v=3)
>>> Equal(5).main(10)
ConditionValue(c=False, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b == a, and v is b.

class utensil.loopflow.functions.basic.GreaterEqual(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than or equal to a constant, i.e., b >= a.

a¶: the constant value to be compared with.

>>> GreaterEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> GreaterEqual(15).main(10)
ConditionValue(c=False, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b >= a, and v is b.

class utensil.loopflow.functions.basic.LessThan(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than a constant, i.e., b < a.

a¶: the constant value to be compared with.

>>> LessThan(3).main(3)
ConditionValue(c=False, v=3)
>>> LessThan(15).main(10)
ConditionValue(c=True, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b < a, and v is b.

class utensil.loopflow.functions.basic.GreaterThan(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than a constant, i.e., b > a.

a¶: the constant value to be compared with.

>>> GreaterThan(3).main(3)
ConditionValue(c=False, v=3)
>>> GreaterThan(5).main(10)
ConditionValue(c=True, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b > a, and v is b.

utensil.loopflow.functions.dataflow module¶

Provide NodeProcessFunction for machine learning work flows.

Example:

from utensil.loopflow.functions import dataflow
from utensil.loopflow.loopflow import register_node_process_functions
register_node_process_functions(dataflow)

class utensil.loopflow.functions.dataflow.Feature(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶

Bases: pandas.core.series.Series

A feature of a dataset.

Feature is an individual measurable property or characteristic of a phenomenon. It can be a list of numbers, strings with or without missing values. The length of a feature (missing values included) should be the number of instance in a dataset.

class utensil.loopflow.functions.dataflow.Features(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]¶

Bases: pandas.core.frame.DataFrame

A list of features.

Features is a list of Feature. It can be represented as a matrix of numbers, strings, missing values, etc.

class utensil.loopflow.functions.dataflow.Target(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶

Bases: pandas.core.series.Series

The target of a dataset.

Target is whatever the output of the input variables. Typically, it is the variables a supervised model trying to learn to predict, either numerical or categorical.

class utensil.loopflow.functions.dataflow.Dataset(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features)[source]¶

Bases: object

A dataset used to train a model or to let a model predict its target.

A pair of Target and Features. For supervised case, to train or to score a model, use both of target and features; to predict only, use only the features. The length of target should be identical to the length of every feature of features, i.e., the number of instances.

>>> dataset = Dataset(
...     Target(np.random.randint(2, size=3)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> dataset.nrows
3
>>> dataset.ncols
4
>>> bad_dataset = Dataset(
...     Target(np.random.randint(2, size=2)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> bad_dataset.nrows
Traceback (most recent call last):
...
ValueError: rows of target and that of features should be the same

target: utensil.loopflow.functions.dataflow.Target¶: The target of the dataset.

features: utensil.loopflow.functions.dataflow.Features¶: The features of the dataset.

property nrows¶: Number of rows/instances.

property ncols¶: Number of columns/features.

class utensil.loopflow.functions.dataflow.Model[source]¶

Bases: object

A base model class to be trained and to predict target based on a dataset.

Before calling Model.train(), the model is untrained and should not be used to predict. After that, Model.predict() can be called to predict the Target of Features.

train(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Model[source]¶

Train a model.

Use self as a base model to train on dataset for a trained model.

Should be overridden by subclass for implementation. >>> Model().train(Dataset( … Target(np.random.randint(2, size=3)), … Features(np.random.random(size=(3, 4))) … )) Traceback (most recent call last):

…

NotImplementedError

Parameters: dataset (Dataset) – dataset to be trained on.
Returns: A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Target[source]¶

Predict the target.

Model returned from Model.train() can predict for Target on a given Features.

Should be overridden by subclass for implementation. >>> Model().predict(Features(np.random.random(size=(3, 4)))) Traceback (most recent call last):

…

NotImplementedError

Parameters: features (Features) – used to predicted Target.
Returns: The prediction of Target.

class utensil.loopflow.functions.dataflow.SklearnModel(model)[source]¶

Bases: utensil.loopflow.functions.dataflow.Model

A wrapper for sklearn models.

>>> from sklearn.linear_model import LinearRegression
>>> model = SklearnModel(LinearRegression())
>>> target = Target([1, 2, 3])
>>> features = Features([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> model = model.train(Dataset(target, features))
>>> model.predict(features + 1)
0    1.25
1    2.25
2    3.25
dtype: float64

train(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Model[source]¶

Train a model.

Use self as a base model to train on dataset for a trained model. Typically the fit method of the sklearn model is used.

Parameters: dataset (Dataset) – dataset to be trained on.
Returns: A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Target[source]¶

Predict the target.

Model returned from Model.train() can predict for Target on a given Features. Typically the predict method of the sklearn model is used.

Parameters: features (Features) – used to predicted Target.
Returns: The prediction of Target.

class utensil.loopflow.functions.dataflow.LoadData(dformat: str, url: str, target: str, features: Dict[int, str])[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Load a dataset from an URL.

URL can be a path. Data format can be SVMLIGHT.

dformat¶

Data format. Valid options are SVMLIGHT.

Todo

More format are needed

CSV
HDF5

Type: str

url¶

URL for the dataset. Should be a path or a url with the scheme http, https or file.

Todo

More types are needed

sklearn data.

Type: str

target¶

The column of the dataset treated as a target.

Type: str

features¶

A mapping from 0-index of column to its name. This is useful when the dataset itself does not contain its own column names, for example, svmlight format.

Type: dict[int, str]

main() → utensil.loopflow.functions.dataflow.Dataset[source]¶

class utensil.loopflow.functions.dataflow.FilterRows(filter_by: Dict[str, Any])[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Filter rows of Dataset.

Filter rows of dataset by the value of its Target.

filter_by¶

Indicate to filter by which column with what values. Typical usage is to filter TARGET with a list of values. For example, filter_by={"TARGET": [1, 2]} filters the target column to only contains 1 or 2.

Type: dict[str, Any]

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Dataset[source]¶

Parameters: dataset (Dataset) – the dataset to be filtered.
Returns: A filtered dataset.

class utensil.loopflow.functions.dataflow.SamplingRows(number: int = MISSING, ratio: float = MISSING, stratified: bool = False, replace: bool = False, random_seed: Union[None, int] = None, return_rest: bool = False)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Sampling rows of a dataset.

This method samples a dataset to a specific number of rows or to a ratio.

number¶

Sampled dataset will have this many rows. Suppressed by ratio.

Type: int

ratio¶

Sampled dataset will have ratio * dataset.nrows rows. Suppressing number.

Type: float, default 1.0 if number is not set

stratified¶

If True, the dataset will be sampled using a stratified manner. That is, there will be same number of rows for each category of the dataset target, if possible.

Type: bool, default False

replace¶

If True, the dataset will be sampled with replacement and a row may be selected multiple times. Will raise an exception if replace is set to False and number larger than dataset.nrows or ratio larger than 1.

Type: bool, default False

random_seed¶

Random seed used to sample the dataset. It is used to set numpy.random.BitGenerator. See Numpy Documentation for more information.

Type: None or int, default None

return_rest¶

If False, only the sampled dataset is returned.

If True, this method will return a dictionary of two datasets,

{
    'sampled': sampled_dataset,
    'rest': rest_dataset,
}

rest_dataset contains all rows not in sampled_dataset.

Note

Even if sampled_dataset is sampled with replacement, rest_dataset does not contain duplicated rows.

Type: bool, default False

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → Union[utensil.loopflow.functions.dataflow.Dataset, Dict[str, utensil.loopflow.functions.dataflow.Dataset]][source]¶

Parameters: dataset (Dataset) – the dataset to be sampled.
Returns: A sampled dataset or a dictionary of the sampled dataset and the rest dataset.

class utensil.loopflow.functions.dataflow.MakeDataset[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make a dataset using target and features.

main(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Dataset[source]¶

Parameters

target (Target) – the input target.
features (Features) – the input features.

Returns

A dataset consisted of target and features.

class utensil.loopflow.functions.dataflow.GetTarget[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get target from a dataset.

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Target[source]¶

Parameters: dataset (Dataset) – get target from this dataset.
Returns: The target of the dataset.

class utensil.loopflow.functions.dataflow.GetFeature(feature: str)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get feature from a dataset with a given name.

feature¶

This feature will be retrieved from the dataset.

Type: str

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Feature[source]¶

Parameters: dataset (Dataset) – get feature from this dataset.
Returns: The feature with the given name of the dataset.

class utensil.loopflow.functions.dataflow.MergeFeatures[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Merge a list of feature to features.

main(*features: utensil.loopflow.functions.dataflow.Feature) → utensil.loopflow.functions.dataflow.Features[source]¶

Parameters: *features (list of Feature) – list of feature to be merged.
Returns: A Features object contains the list features.

class utensil.loopflow.functions.dataflow.LinearNormalize(upper: Optional[Dict[str, Any]] = None, lower: Optional[Dict[str, Any]] = None)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Perform linear normalization of a 1d array.

Linearly maps the given array from range (u1, l1) to (u2, l2).

upper¶

Sets u1=upper["FROM"] and u2=upper["TO"]. u* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type: dict of FROM and TO, default both MAX

lower¶

Sets l1=lower["FROM"] and l2=lower["TO"]. l* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type: dict of FROM and TO, default both MIN

main(arr1d: numpy.ndarray) → numpy.ndarray[source]¶

class utensil.loopflow.functions.dataflow.MakeModel(method)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make an untrained model.

method¶

the model will use this method to train. Options are XGBOOST_REGRESSOR, XGBOOST_CLASSIFIER.

Type: str

main(model_params: Dict[str, Any]) → utensil.loopflow.functions.dataflow.Model[source]¶

Parameters

model_params (dict) –

The parameters to create the model. Based on the method, different parameters can be set.

XGBOOST_REGRESSOR:
See more details in XGBoost documentation
- learning_rate
- max_depth
- n_estimators
XGBOOST_CLASSIFIER:
See more details in XGBoost documentation
- learning_rate
- max_depth
- n_estimators
SKLEARN_GRADIENT_BOOSTING_CLASSIFIER:
See more details in Scikit Learn documentation
- learning_rate
- max_depth
- n_estimators

Returns

An untrained Model.

class utensil.loopflow.functions.dataflow.Train[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Train a model.

main(model: utensil.loopflow.functions.dataflow.Model, dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Model[source]¶

Parameters

model (Model) – The model to be trained.
dataset (Dataset) – The dataset to be trained on.

Returns

A trained Model.

class utensil.loopflow.functions.dataflow.Predict[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Predict a target.

main(model: utensil.loopflow.functions.dataflow.Model, features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Target[source]¶

Parameters

model (Model) – The prediction is from this model.
features (Features) – The features used for prediction.

Returns

A Target based on the model and features. The length of the target is identical to the number of rows of the features.

class utensil.loopflow.functions.dataflow.ParameterSearch(init_state=0, seed: int = 0, search_map: Optional[Dict] = None)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Random search the model parameters.

See more in utensil.random_search.

init_state¶

Type: int, default 0

seed¶

Type: int, default 0

search_map¶

Type: dict, default None

main()[source]¶

Returns: Next randomly generated parameters.

class utensil.loopflow.functions.dataflow.Score(dataset: str = MISSING, methods: Optional[Union[str, List[str]]] = None)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Calculate scores of a model, based on its prediction and a ground truth.

dataset¶

The name of the dataset. It is used to generate an informative output.

Type: str

methods¶

The method or a list of methods to score a model. Options are ACCURACY.

Type: str or list of str

main(prediction: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Features, utensil.loopflow.functions.dataflow.Dataset], ground_truth: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Dataset], model: utensil.loopflow.functions.dataflow.Model)[source]¶

Parameters

prediction (target, features or dataset) –
If prediction is Target, it will be directly used to calculate the score without using the model.

If it is Features, model will make a prediction based on it.

If it is Dataset, model will make a prediction based on its features.
ground_truth (target or dataset) –
If ground_truth is a Target, it is directly compared to prediction.

If ground_truth is a Dataset, its target is compared to prediction.
model (Model) –
The model to be scored.

Note

If prediction is Target, then the model is not used.

Returns

A list of scoring results. A scoring result is consisted of two or three attributes, the scoring method name, the dataset name ( if provided), and the score.

For example:

# if dataset name is 'MNIST'
[
    ('ACCURACY', 'MNIST', 0.812641),
    ('FSCORE', 'MNIST', 0.713278),
]

# if dataset name is not provided
[
    ('ACCURACY', 0.812641),
    ('FSCORE', 0.713278),
]

class utensil.loopflow.functions.dataflow.ChangeTypeTo(to_type: str)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Change the type of a given arr.

to_type¶

The arr will be this type. Options are INTEGER, FLOAT.

Type: str

main(arr: Union[utensil.loopflow.functions.dataflow.Feature, utensil.loopflow.functions.dataflow.Target])[source]¶

Parameters: arr (Feature or Target) – The type of this will be changed.
Returns: The arr with type changed to to_type.

Module contents¶

class utensil.loopflow.functions.Dummy[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Identical function.

Returns whatever it get.

>>> Dummy().main('anything')
'anything'

main(a: Any = MISSING)[source]¶

class utensil.loopflow.functions.Default(default)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Implements a default behavior.

Return a default value if triggered before getting anything.

default¶: the default value.

>>> default = Default('my_default')

This will return the input. >>> default.main(‘my_input’) ‘my_input’

This will return the default value. >>> default.main() ‘my_default’

main(o: Any = MISSING)[source]¶

class utensil.loopflow.functions.Add(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Add a predefined constant, i.e., n+a.

a¶: the constant value to be added.

>>> p = Add(3)
>>> p.main(5)
8
>>> p.main(9)
12

main(n)[source]¶

Parameters: n – value to be added with a.
Returns: n+a.

class utensil.loopflow.functions.LessEqual(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than or equal to a constant, i.e., b <= a.

a¶: the constant value to be compared with.

>>> LessEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> LessEqual(5).main(10)
ConditionValue(c=False, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b <= a, and v is b.

class utensil.loopflow.functions.Equal(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is equal to a constant, i.e., b == a.

a¶: the constant value to be compared with.

>>> Equal(3).main(3)
ConditionValue(c=True, v=3)
>>> Equal(5).main(10)
ConditionValue(c=False, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b == a, and v is b.

class utensil.loopflow.functions.GreaterEqual(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than or equal to a constant, i.e., b >= a.

a¶: the constant value to be compared with.

>>> GreaterEqual(3).main(3)
ConditionValue(c=True, v=3)
>>> GreaterEqual(15).main(10)
ConditionValue(c=False, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b >= a, and v is b.

class utensil.loopflow.functions.LessThan(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is less than a constant, i.e., b < a.

a¶: the constant value to be compared with.

>>> LessThan(3).main(3)
ConditionValue(c=False, v=3)
>>> LessThan(15).main(10)
ConditionValue(c=True, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b < a, and v is b.

class utensil.loopflow.functions.GreaterThan(a)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Check is greater than a constant, i.e., b > a.

a¶: the constant value to be compared with.

>>> GreaterThan(3).main(3)
ConditionValue(c=False, v=3)
>>> GreaterThan(5).main(10)
ConditionValue(c=True, v=10)

main(b) → utensil.loopflow.functions.basic.ConditionValue[source]¶

Parameters: b – value to be compared with a.
Returns: a ConditionValue, with c is True if b > a, and v is b.

class utensil.loopflow.functions.Feature(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶

Bases: pandas.core.series.Series

A feature of a dataset.

class utensil.loopflow.functions.Features(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]¶

Bases: pandas.core.frame.DataFrame

A list of features.

Features is a list of Feature. It can be represented as a matrix of numbers, strings, missing values, etc.

class utensil.loopflow.functions.Target(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶

Bases: pandas.core.series.Series

The target of a dataset.

Target is whatever the output of the input variables. Typically, it is the variables a supervised model trying to learn to predict, either numerical or categorical.

class utensil.loopflow.functions.Dataset(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features)[source]¶

Bases: object

A dataset used to train a model or to let a model predict its target.

>>> dataset = Dataset(
...     Target(np.random.randint(2, size=3)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> dataset.nrows
3
>>> dataset.ncols
4
>>> bad_dataset = Dataset(
...     Target(np.random.randint(2, size=2)),
...     Features(np.random.random(size=(3, 4)))
... )
>>> bad_dataset.nrows
Traceback (most recent call last):
...
ValueError: rows of target and that of features should be the same

target: utensil.loopflow.functions.dataflow.Target¶: The target of the dataset.

features: utensil.loopflow.functions.dataflow.Features¶: The features of the dataset.

property nrows¶: Number of rows/instances.

property ncols¶: Number of columns/features.

class utensil.loopflow.functions.Model[source]¶

Bases: object

A base model class to be trained and to predict target based on a dataset.

Before calling Model.train(), the model is untrained and should not be used to predict. After that, Model.predict() can be called to predict the Target of Features.

train(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Model[source]¶

Train a model.

Use self as a base model to train on dataset for a trained model.

…

NotImplementedError

Parameters: dataset (Dataset) – dataset to be trained on.
Returns: A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Target[source]¶

Predict the target.

Model returned from Model.train() can predict for Target on a given Features.

Should be overridden by subclass for implementation. >>> Model().predict(Features(np.random.random(size=(3, 4)))) Traceback (most recent call last):

…

NotImplementedError

Parameters: features (Features) – used to predicted Target.
Returns: The prediction of Target.

class utensil.loopflow.functions.SklearnModel(model)[source]¶

Bases: utensil.loopflow.functions.dataflow.Model

A wrapper for sklearn models.

>>> from sklearn.linear_model import LinearRegression
>>> model = SklearnModel(LinearRegression())
>>> target = Target([1, 2, 3])
>>> features = Features([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> model = model.train(Dataset(target, features))
>>> model.predict(features + 1)
0    1.25
1    2.25
2    3.25
dtype: float64

train(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Model[source]¶

Train a model.

Use self as a base model to train on dataset for a trained model. Typically the fit method of the sklearn model is used.

Parameters: dataset (Dataset) – dataset to be trained on.
Returns: A trained Model.

predict(features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Target[source]¶

Predict the target.

Model returned from Model.train() can predict for Target on a given Features. Typically the predict method of the sklearn model is used.

Parameters: features (Features) – used to predicted Target.
Returns: The prediction of Target.

class utensil.loopflow.functions.LoadData(dformat: str, url: str, target: str, features: Dict[int, str])[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Load a dataset from an URL.

URL can be a path. Data format can be SVMLIGHT.

dformat¶

Data format. Valid options are SVMLIGHT.

Todo

More format are needed

CSV
HDF5

Type: str

url¶

URL for the dataset. Should be a path or a url with the scheme http, https or file.

Todo

More types are needed

sklearn data.

Type: str

target¶

The column of the dataset treated as a target.

Type: str

features¶

A mapping from 0-index of column to its name. This is useful when the dataset itself does not contain its own column names, for example, svmlight format.

Type: dict[int, str]

main() → utensil.loopflow.functions.dataflow.Dataset[source]¶

class utensil.loopflow.functions.FilterRows(filter_by: Dict[str, Any])[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Filter rows of Dataset.

Filter rows of dataset by the value of its Target.

filter_by¶

Type: dict[str, Any]

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Dataset[source]¶

Parameters: dataset (Dataset) – the dataset to be filtered.
Returns: A filtered dataset.

class utensil.loopflow.functions.SamplingRows(number: int = MISSING, ratio: float = MISSING, stratified: bool = False, replace: bool = False, random_seed: Union[None, int] = None, return_rest: bool = False)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Sampling rows of a dataset.

This method samples a dataset to a specific number of rows or to a ratio.

number¶

Sampled dataset will have this many rows. Suppressed by ratio.

Type: int

ratio¶

Sampled dataset will have ratio * dataset.nrows rows. Suppressing number.

Type: float, default 1.0 if number is not set

stratified¶

If True, the dataset will be sampled using a stratified manner. That is, there will be same number of rows for each category of the dataset target, if possible.

Type: bool, default False

replace¶

Type: bool, default False

random_seed¶

Random seed used to sample the dataset. It is used to set numpy.random.BitGenerator. See Numpy Documentation for more information.

Type: None or int, default None

return_rest¶

If False, only the sampled dataset is returned.

If True, this method will return a dictionary of two datasets,

{
    'sampled': sampled_dataset,
    'rest': rest_dataset,
}

rest_dataset contains all rows not in sampled_dataset.

Note

Even if sampled_dataset is sampled with replacement, rest_dataset does not contain duplicated rows.

Type: bool, default False

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → Union[utensil.loopflow.functions.dataflow.Dataset, Dict[str, utensil.loopflow.functions.dataflow.Dataset]][source]¶

Parameters: dataset (Dataset) – the dataset to be sampled.
Returns: A sampled dataset or a dictionary of the sampled dataset and the rest dataset.

class utensil.loopflow.functions.MakeDataset[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make a dataset using target and features.

main(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Dataset[source]¶

Parameters

target (Target) – the input target.
features (Features) – the input features.

Returns

A dataset consisted of target and features.

class utensil.loopflow.functions.GetTarget[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get target from a dataset.

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Target[source]¶

Parameters: dataset (Dataset) – get target from this dataset.
Returns: The target of the dataset.

class utensil.loopflow.functions.GetFeature(feature: str)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Get feature from a dataset with a given name.

feature¶

This feature will be retrieved from the dataset.

Type: str

main(dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Feature[source]¶

Parameters: dataset (Dataset) – get feature from this dataset.
Returns: The feature with the given name of the dataset.

class utensil.loopflow.functions.MergeFeatures[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Merge a list of feature to features.

main(*features: utensil.loopflow.functions.dataflow.Feature) → utensil.loopflow.functions.dataflow.Features[source]¶

Parameters: *features (list of Feature) – list of feature to be merged.
Returns: A Features object contains the list features.

class utensil.loopflow.functions.LinearNormalize(upper: Optional[Dict[str, Any]] = None, lower: Optional[Dict[str, Any]] = None)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Perform linear normalization of a 1d array.

Linearly maps the given array from range (u1, l1) to (u2, l2).

upper¶

Sets u1=upper["FROM"] and u2=upper["TO"]. u* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type: dict of FROM and TO, default both MAX

lower¶

Sets l1=lower["FROM"] and l2=lower["TO"]. l* should be a number or a string, MAX or MIN. MAX means the maximum of the array, and MIN means the minimum of the array.

Type: dict of FROM and TO, default both MIN

main(arr1d: numpy.ndarray) → numpy.ndarray[source]¶

class utensil.loopflow.functions.MakeModel(method)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Make an untrained model.

method¶

the model will use this method to train. Options are XGBOOST_REGRESSOR, XGBOOST_CLASSIFIER.

Type: str

main(model_params: Dict[str, Any]) → utensil.loopflow.functions.dataflow.Model[source]¶

Parameters

model_params (dict) –

The parameters to create the model. Based on the method, different parameters can be set.

XGBOOST_REGRESSOR:
See more details in XGBoost documentation
- learning_rate
- max_depth
- n_estimators
XGBOOST_CLASSIFIER:
See more details in XGBoost documentation
- learning_rate
- max_depth
- n_estimators
SKLEARN_GRADIENT_BOOSTING_CLASSIFIER:
See more details in Scikit Learn documentation
- learning_rate
- max_depth
- n_estimators

Returns

An untrained Model.

class utensil.loopflow.functions.Train[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Train a model.

main(model: utensil.loopflow.functions.dataflow.Model, dataset: utensil.loopflow.functions.dataflow.Dataset) → utensil.loopflow.functions.dataflow.Model[source]¶

Parameters

model (Model) – The model to be trained.
dataset (Dataset) – The dataset to be trained on.

Returns

A trained Model.

class utensil.loopflow.functions.Predict[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Predict a target.

main(model: utensil.loopflow.functions.dataflow.Model, features: utensil.loopflow.functions.dataflow.Features) → utensil.loopflow.functions.dataflow.Target[source]¶

Parameters

model (Model) – The prediction is from this model.
features (Features) – The features used for prediction.

Returns

A Target based on the model and features. The length of the target is identical to the number of rows of the features.

class utensil.loopflow.functions.ParameterSearch(init_state=0, seed: int = 0, search_map: Optional[Dict] = None)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Random search the model parameters.

See more in utensil.random_search.

init_state¶

Type: int, default 0

seed¶

Type: int, default 0

search_map¶

Type: dict, default None

main()[source]¶

Returns: Next randomly generated parameters.

class utensil.loopflow.functions.Score(dataset: str = MISSING, methods: Optional[Union[str, List[str]]] = None)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Calculate scores of a model, based on its prediction and a ground truth.

dataset¶

The name of the dataset. It is used to generate an informative output.

Type: str

methods¶

The method or a list of methods to score a model. Options are ACCURACY.

Type: str or list of str

Parameters

prediction (target, features or dataset) –
If prediction is Target, it will be directly used to calculate the score without using the model.

If it is Features, model will make a prediction based on it.

If it is Dataset, model will make a prediction based on its features.
ground_truth (target or dataset) –
If ground_truth is a Target, it is directly compared to prediction.

If ground_truth is a Dataset, its target is compared to prediction.
model (Model) –
The model to be scored.

Note

If prediction is Target, then the model is not used.

Returns

A list of scoring results. A scoring result is consisted of two or three attributes, the scoring method name, the dataset name ( if provided), and the score.

For example:

# if dataset name is 'MNIST'
[
    ('ACCURACY', 'MNIST', 0.812641),
    ('FSCORE', 'MNIST', 0.713278),
]

# if dataset name is not provided
[
    ('ACCURACY', 0.812641),
    ('FSCORE', 0.713278),
]

class utensil.loopflow.functions.ChangeTypeTo(to_type: str)[source]¶

Bases: utensil.loopflow.loopflow.NodeProcessFunction

Change the type of a given arr.

to_type¶

The arr will be this type. Options are INTEGER, FLOAT.

Type: str

main(arr: Union[utensil.loopflow.functions.dataflow.Feature, utensil.loopflow.functions.dataflow.Target])[source]¶

Parameters: arr (Feature or Target) – The type of this will be changed.
Returns: The arr with type changed to to_type.

utensil.loopflow package

utensil.random_search package