utensil.loopflow.functions package¶
Submodules¶
utensil.loopflow.functions.basic module¶
Provide NodeProcessFunction for basic usage.
Example:
from utensil.loopflow.functions import basic
from utensil.loopflow.loopflow import register_node_process_functions
register_node_process_functions(basic)
- utensil.loopflow.functions.basic.MISSING = MISSING¶
Missing token.
Used to indicate a missing value.
- class utensil.loopflow.functions.basic.Dummy[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Identical function.
Returns whatever it get.
>>> Dummy().main('anything') 'anything'
- class utensil.loopflow.functions.basic.Default(default)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Implements a default behavior.
Return a default value if triggered before getting anything.
- default¶
the default value.
>>> default = Default('my_default')
This will return the input. >>> default.main(‘my_input’) ‘my_input’
This will return the default value. >>> default.main() ‘my_default’
- class utensil.loopflow.functions.basic.Add(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Add a predefined constant, i.e.,
n+a
.- a¶
the constant value to be added.
>>> p = Add(3) >>> p.main(5) 8 >>> p.main(9) 12
- namedtuple utensil.loopflow.functions.basic.ConditionValue(c, v)¶
Bases:
namedtuple()
A pair of a boolean and a value for flow control.
- c¶
a boolean value indicating if condition is passed.
- v¶
the value to be used.
ConditionValue(c, v)
- Fields
c – Alias for field number 0
v – Alias for field number 1
- class utensil.loopflow.functions.basic.LessEqual(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is less than or equal to a constant, i.e.,
b <= a
.- a¶
the constant value to be compared with.
>>> LessEqual(3).main(3) ConditionValue(c=True, v=3) >>> LessEqual(5).main(10) ConditionValue(c=False, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb <= a
, andv
isb
.
- class utensil.loopflow.functions.basic.Equal(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is equal to a constant, i.e.,
b == a
.- a¶
the constant value to be compared with.
>>> Equal(3).main(3) ConditionValue(c=True, v=3) >>> Equal(5).main(10) ConditionValue(c=False, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb == a
, andv
isb
.
- class utensil.loopflow.functions.basic.GreaterEqual(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is greater than or equal to a constant, i.e.,
b >= a
.- a¶
the constant value to be compared with.
>>> GreaterEqual(3).main(3) ConditionValue(c=True, v=3) >>> GreaterEqual(15).main(10) ConditionValue(c=False, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb >= a
, andv
isb
.
- class utensil.loopflow.functions.basic.LessThan(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is less than a constant, i.e.,
b < a
.- a¶
the constant value to be compared with.
>>> LessThan(3).main(3) ConditionValue(c=False, v=3) >>> LessThan(15).main(10) ConditionValue(c=True, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb < a
, andv
isb
.
- class utensil.loopflow.functions.basic.GreaterThan(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is greater than a constant, i.e.,
b > a
.- a¶
the constant value to be compared with.
>>> GreaterThan(3).main(3) ConditionValue(c=False, v=3) >>> GreaterThan(5).main(10) ConditionValue(c=True, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb > a
, andv
isb
.
utensil.loopflow.functions.dataflow module¶
Provide NodeProcessFunction for machine learning work flows.
Example:
from utensil.loopflow.functions import dataflow
from utensil.loopflow.loopflow import register_node_process_functions
register_node_process_functions(dataflow)
- class utensil.loopflow.functions.dataflow.Feature(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶
Bases:
pandas.core.series.Series
A feature of a dataset.
Feature
is an individual measurable property or characteristic of a phenomenon. It can be a list of numbers, strings with or without missing values. The length of a feature (missing values included) should be the number of instance in a dataset.
- class utensil.loopflow.functions.dataflow.Features(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]¶
Bases:
pandas.core.frame.DataFrame
A list of features.
Features
is a list ofFeature
. It can be represented as a matrix of numbers, strings, missing values, etc.
- class utensil.loopflow.functions.dataflow.Target(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶
Bases:
pandas.core.series.Series
The target of a dataset.
Target
is whatever the output of the input variables. Typically, it is the variables a supervised model trying to learn to predict, either numerical or categorical.
- class utensil.loopflow.functions.dataflow.Dataset(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features)[source]¶
Bases:
object
A dataset used to train a model or to let a model predict its target.
A pair of
Target
andFeatures
. For supervised case, to train or to score a model, use both of target and features; to predict only, use only the features. The length of target should be identical to the length of every feature of features, i.e., the number of instances.>>> dataset = Dataset( ... Target(np.random.randint(2, size=3)), ... Features(np.random.random(size=(3, 4))) ... ) >>> dataset.nrows 3 >>> dataset.ncols 4 >>> bad_dataset = Dataset( ... Target(np.random.randint(2, size=2)), ... Features(np.random.random(size=(3, 4))) ... ) >>> bad_dataset.nrows Traceback (most recent call last): ... ValueError: rows of target and that of features should be the same
- target: utensil.loopflow.functions.dataflow.Target¶
The target of the dataset.
- features: utensil.loopflow.functions.dataflow.Features¶
The features of the dataset.
- property nrows¶
Number of rows/instances.
- property ncols¶
Number of columns/features.
- class utensil.loopflow.functions.dataflow.Model[source]¶
Bases:
object
A base model class to be trained and to predict target based on a dataset.
Before calling
Model.train()
, the model is untrained and should not be used to predict. After that,Model.predict()
can be called to predict theTarget
ofFeatures
.- train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model [source]¶
Train a model.
Use
self
as a base model to train ondataset
for a trained model.Should be overridden by subclass for implementation. >>> Model().train(Dataset( … Target(np.random.randint(2, size=3)), … Features(np.random.random(size=(3, 4))) … )) Traceback (most recent call last):
…
NotImplementedError
- predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target [source]¶
Predict the target.
Model returned from
Model.train()
can predict forTarget
on a givenFeatures
.Should be overridden by subclass for implementation. >>> Model().predict(Features(np.random.random(size=(3, 4)))) Traceback (most recent call last):
…
NotImplementedError
- class utensil.loopflow.functions.dataflow.SklearnModel(model)[source]¶
Bases:
utensil.loopflow.functions.dataflow.Model
A wrapper for
sklearn
models.>>> from sklearn.linear_model import LinearRegression >>> model = SklearnModel(LinearRegression()) >>> target = Target([1, 2, 3]) >>> features = Features([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) >>> model = model.train(Dataset(target, features)) >>> model.predict(features + 1) 0 1.25 1 2.25 2 3.25 dtype: float64
- train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model [source]¶
Train a model.
Use
self
as a base model to train ondataset
for a trained model. Typically thefit
method of thesklearn
model is used.
- predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target [source]¶
Predict the target.
Model returned from
Model.train()
can predict forTarget
on a givenFeatures
. Typically thepredict
method of thesklearn
model is used.
- class utensil.loopflow.functions.dataflow.LoadData(dformat: str, url: str, target: str, features: Dict[int, str])[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Load a dataset from an URL.
URL can be a path. Data format can be
SVMLIGHT
.- dformat¶
Data format. Valid options are
SVMLIGHT
.Todo
More format are needed
CSV
HDF5
- Type
str
- url¶
URL for the dataset. Should be a path or a url with the scheme http, https or file.
Todo
More types are needed
sklearn data.
- Type
str
- target¶
The column of the dataset treated as a target.
- Type
str
- features¶
A mapping from 0-index of column to its name. This is useful when the dataset itself does not contain its own column names, for example,
svmlight
format.- Type
dict[int, str]
- class utensil.loopflow.functions.dataflow.FilterRows(filter_by: Dict[str, Any])[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Filter rows of
Dataset
.Filter rows of dataset by the value of its
Target
.- filter_by¶
Indicate to filter by which column with what values. Typical usage is to filter
TARGET
with a list of values. For example,filter_by={"TARGET": [1, 2]}
filters the target column to only contains 1 or 2.- Type
dict[str, Any]
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Dataset [source]¶
- Parameters
dataset (
Dataset
) – the dataset to be filtered.- Returns
A filtered dataset.
- class utensil.loopflow.functions.dataflow.SamplingRows(number: int = MISSING, ratio: float = MISSING, stratified: bool = False, replace: bool = False, random_seed: Union[None, int] = None, return_rest: bool = False)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Sampling rows of a dataset.
This method samples a dataset to a specific number of rows or to a ratio.
- number¶
Sampled dataset will have this many rows. Suppressed by ratio.
- Type
int
- ratio¶
Sampled dataset will have ratio * dataset.nrows rows. Suppressing number.
- Type
float, default 1.0 if number is not set
- stratified¶
If True, the dataset will be sampled using a stratified manner. That is, there will be same number of rows for each category of the dataset target, if possible.
- Type
bool, default False
- replace¶
If True, the dataset will be sampled with replacement and a row may be selected multiple times. Will raise an exception if replace is set to False and number larger than dataset.nrows or ratio larger than 1.
- Type
bool, default False
- random_seed¶
Random seed used to sample the dataset. It is used to set numpy.random.BitGenerator. See Numpy Documentation for more information.
- Type
None or int, default None
- return_rest¶
If False, only the sampled dataset is returned.
If True, this method will return a dictionary of two datasets,
{ 'sampled': sampled_dataset, 'rest': rest_dataset, }
rest_dataset contains all rows not in sampled_dataset.
Note
Even if sampled_dataset is sampled with replacement, rest_dataset does not contain duplicated rows.
- Type
bool, default False
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) Union[utensil.loopflow.functions.dataflow.Dataset, Dict[str, utensil.loopflow.functions.dataflow.Dataset]] [source]¶
- Parameters
dataset (
Dataset
) – the dataset to be sampled.- Returns
A sampled dataset or a dictionary of the sampled dataset and the rest dataset.
- class utensil.loopflow.functions.dataflow.MakeDataset[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Make a dataset using target and features.
- class utensil.loopflow.functions.dataflow.GetTarget[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Get target from a dataset.
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Target [source]¶
- Parameters
dataset (
Dataset
) – get target from this dataset.- Returns
The target of the dataset.
- class utensil.loopflow.functions.dataflow.GetFeature(feature: str)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Get feature from a dataset with a given name.
- feature¶
This feature will be retrieved from the dataset.
- Type
str
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Feature [source]¶
- Parameters
dataset (
Dataset
) – get feature from this dataset.- Returns
The feature with the given name of the dataset.
- class utensil.loopflow.functions.dataflow.MergeFeatures[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Merge a list of feature to features.
- class utensil.loopflow.functions.dataflow.LinearNormalize(upper: Optional[Dict[str, Any]] = None, lower: Optional[Dict[str, Any]] = None)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Perform linear normalization of a 1d array.
Linearly maps the given array from range
(u1, l1)
to(u2, l2)
.- upper¶
Sets
u1=upper["FROM"]
andu2=upper["TO"]
.u*
should be a number or a string,MAX
orMIN
.MAX
means the maximum of the array, andMIN
means the minimum of the array.- Type
dict of
FROM
andTO
, default bothMAX
- lower¶
Sets
l1=lower["FROM"]
andl2=lower["TO"]
.l*
should be a number or a string,MAX
orMIN
.MAX
means the maximum of the array, andMIN
means the minimum of the array.- Type
dict of
FROM
andTO
, default bothMIN
- class utensil.loopflow.functions.dataflow.MakeModel(method)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Make an untrained model.
- method¶
the model will use this method to train. Options are
XGBOOST_REGRESSOR
,XGBOOST_CLASSIFIER
.- Type
str
- main(model_params: Dict[str, Any]) utensil.loopflow.functions.dataflow.Model [source]¶
- Parameters
model_params (dict) –
The parameters to create the model. Based on the method, different parameters can be set.
XGBOOST_REGRESSOR
:See more details in XGBoost documentation
learning_rate
max_depth
n_estimators
XGBOOST_CLASSIFIER
:See more details in XGBoost documentation
learning_rate
max_depth
n_estimators
SKLEARN_GRADIENT_BOOSTING_CLASSIFIER
:See more details in Scikit Learn documentation
learning_rate
max_depth
n_estimators
- Returns
An untrained
Model
.
- class utensil.loopflow.functions.dataflow.Train[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Train a model.
- class utensil.loopflow.functions.dataflow.Predict[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Predict a target.
- class utensil.loopflow.functions.dataflow.ParameterSearch(init_state=0, seed: int = 0, search_map: Optional[Dict] = None)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Random search the model parameters.
See more in
utensil.random_search
.- init_state¶
- Type
int, default 0
- seed¶
- Type
int, default 0
- search_map¶
- Type
dict, default None
- class utensil.loopflow.functions.dataflow.Score(dataset: str = MISSING, methods: Optional[Union[str, List[str]]] = None)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Calculate scores of a model, based on its prediction and a ground truth.
- dataset¶
The name of the dataset. It is used to generate an informative output.
- Type
str
- methods¶
The method or a list of methods to score a model. Options are
ACCURACY
.- Type
str or list of str
- main(prediction: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Features, utensil.loopflow.functions.dataflow.Dataset], ground_truth: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Dataset], model: utensil.loopflow.functions.dataflow.Model)[source]¶
- Parameters
prediction (target, features or dataset) –
If prediction is
Target
, it will be directly used to calculate the score without using the model.If it is
Features
, model will make a prediction based on it.If it is
Dataset
, model will make a prediction based on its features.ground_truth (target or dataset) –
If ground_truth is a
Target
, it is directly compared to prediction.If ground_truth is a
Dataset
, its target is compared to prediction.model (
Model
) –The model to be scored.
Note
If prediction is
Target
, then the model is not used.
- Returns
A list of scoring results. A scoring result is consisted of two or three attributes, the scoring method name, the dataset name ( if provided), and the score.
For example:
# if dataset name is 'MNIST' [ ('ACCURACY', 'MNIST', 0.812641), ('FSCORE', 'MNIST', 0.713278), ] # if dataset name is not provided [ ('ACCURACY', 0.812641), ('FSCORE', 0.713278), ]
- class utensil.loopflow.functions.dataflow.ChangeTypeTo(to_type: str)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Change the type of a given arr.
- to_type¶
The arr will be this type. Options are
INTEGER
,FLOAT
.- Type
str
Module contents¶
- class utensil.loopflow.functions.Dummy[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Identical function.
Returns whatever it get.
>>> Dummy().main('anything') 'anything'
- class utensil.loopflow.functions.Default(default)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Implements a default behavior.
Return a default value if triggered before getting anything.
- default¶
the default value.
>>> default = Default('my_default')
This will return the input. >>> default.main(‘my_input’) ‘my_input’
This will return the default value. >>> default.main() ‘my_default’
- class utensil.loopflow.functions.Add(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Add a predefined constant, i.e.,
n+a
.- a¶
the constant value to be added.
>>> p = Add(3) >>> p.main(5) 8 >>> p.main(9) 12
- class utensil.loopflow.functions.LessEqual(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is less than or equal to a constant, i.e.,
b <= a
.- a¶
the constant value to be compared with.
>>> LessEqual(3).main(3) ConditionValue(c=True, v=3) >>> LessEqual(5).main(10) ConditionValue(c=False, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb <= a
, andv
isb
.
- class utensil.loopflow.functions.Equal(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is equal to a constant, i.e.,
b == a
.- a¶
the constant value to be compared with.
>>> Equal(3).main(3) ConditionValue(c=True, v=3) >>> Equal(5).main(10) ConditionValue(c=False, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb == a
, andv
isb
.
- class utensil.loopflow.functions.GreaterEqual(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is greater than or equal to a constant, i.e.,
b >= a
.- a¶
the constant value to be compared with.
>>> GreaterEqual(3).main(3) ConditionValue(c=True, v=3) >>> GreaterEqual(15).main(10) ConditionValue(c=False, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb >= a
, andv
isb
.
- class utensil.loopflow.functions.LessThan(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is less than a constant, i.e.,
b < a
.- a¶
the constant value to be compared with.
>>> LessThan(3).main(3) ConditionValue(c=False, v=3) >>> LessThan(15).main(10) ConditionValue(c=True, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb < a
, andv
isb
.
- class utensil.loopflow.functions.GreaterThan(a)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Check is greater than a constant, i.e.,
b > a
.- a¶
the constant value to be compared with.
>>> GreaterThan(3).main(3) ConditionValue(c=False, v=3) >>> GreaterThan(5).main(10) ConditionValue(c=True, v=10)
- main(b) utensil.loopflow.functions.basic.ConditionValue [source]¶
- Parameters
b – value to be compared with
a
.- Returns
a
ConditionValue
, withc
is True ifb > a
, andv
isb
.
- class utensil.loopflow.functions.Feature(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶
Bases:
pandas.core.series.Series
A feature of a dataset.
Feature
is an individual measurable property or characteristic of a phenomenon. It can be a list of numbers, strings with or without missing values. The length of a feature (missing values included) should be the number of instance in a dataset.
- class utensil.loopflow.functions.Features(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)[source]¶
Bases:
pandas.core.frame.DataFrame
A list of features.
Features
is a list ofFeature
. It can be represented as a matrix of numbers, strings, missing values, etc.
- class utensil.loopflow.functions.Target(data=None, index=None, dtype: Dtype | None = None, name=None, copy: bool = False, fastpath: bool = False)[source]¶
Bases:
pandas.core.series.Series
The target of a dataset.
Target
is whatever the output of the input variables. Typically, it is the variables a supervised model trying to learn to predict, either numerical or categorical.
- class utensil.loopflow.functions.Dataset(target: utensil.loopflow.functions.dataflow.Target, features: utensil.loopflow.functions.dataflow.Features)[source]¶
Bases:
object
A dataset used to train a model or to let a model predict its target.
A pair of
Target
andFeatures
. For supervised case, to train or to score a model, use both of target and features; to predict only, use only the features. The length of target should be identical to the length of every feature of features, i.e., the number of instances.>>> dataset = Dataset( ... Target(np.random.randint(2, size=3)), ... Features(np.random.random(size=(3, 4))) ... ) >>> dataset.nrows 3 >>> dataset.ncols 4 >>> bad_dataset = Dataset( ... Target(np.random.randint(2, size=2)), ... Features(np.random.random(size=(3, 4))) ... ) >>> bad_dataset.nrows Traceback (most recent call last): ... ValueError: rows of target and that of features should be the same
- target: utensil.loopflow.functions.dataflow.Target¶
The target of the dataset.
- features: utensil.loopflow.functions.dataflow.Features¶
The features of the dataset.
- property nrows¶
Number of rows/instances.
- property ncols¶
Number of columns/features.
- class utensil.loopflow.functions.Model[source]¶
Bases:
object
A base model class to be trained and to predict target based on a dataset.
Before calling
Model.train()
, the model is untrained and should not be used to predict. After that,Model.predict()
can be called to predict theTarget
ofFeatures
.- train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model [source]¶
Train a model.
Use
self
as a base model to train ondataset
for a trained model.Should be overridden by subclass for implementation. >>> Model().train(Dataset( … Target(np.random.randint(2, size=3)), … Features(np.random.random(size=(3, 4))) … )) Traceback (most recent call last):
…
NotImplementedError
- predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target [source]¶
Predict the target.
Model returned from
Model.train()
can predict forTarget
on a givenFeatures
.Should be overridden by subclass for implementation. >>> Model().predict(Features(np.random.random(size=(3, 4)))) Traceback (most recent call last):
…
NotImplementedError
- class utensil.loopflow.functions.SklearnModel(model)[source]¶
Bases:
utensil.loopflow.functions.dataflow.Model
A wrapper for
sklearn
models.>>> from sklearn.linear_model import LinearRegression >>> model = SklearnModel(LinearRegression()) >>> target = Target([1, 2, 3]) >>> features = Features([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) >>> model = model.train(Dataset(target, features)) >>> model.predict(features + 1) 0 1.25 1 2.25 2 3.25 dtype: float64
- train(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Model [source]¶
Train a model.
Use
self
as a base model to train ondataset
for a trained model. Typically thefit
method of thesklearn
model is used.
- predict(features: utensil.loopflow.functions.dataflow.Features) utensil.loopflow.functions.dataflow.Target [source]¶
Predict the target.
Model returned from
Model.train()
can predict forTarget
on a givenFeatures
. Typically thepredict
method of thesklearn
model is used.
- class utensil.loopflow.functions.LoadData(dformat: str, url: str, target: str, features: Dict[int, str])[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Load a dataset from an URL.
URL can be a path. Data format can be
SVMLIGHT
.- dformat¶
Data format. Valid options are
SVMLIGHT
.Todo
More format are needed
CSV
HDF5
- Type
str
- url¶
URL for the dataset. Should be a path or a url with the scheme http, https or file.
Todo
More types are needed
sklearn data.
- Type
str
- target¶
The column of the dataset treated as a target.
- Type
str
- features¶
A mapping from 0-index of column to its name. This is useful when the dataset itself does not contain its own column names, for example,
svmlight
format.- Type
dict[int, str]
- class utensil.loopflow.functions.FilterRows(filter_by: Dict[str, Any])[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Filter rows of
Dataset
.Filter rows of dataset by the value of its
Target
.- filter_by¶
Indicate to filter by which column with what values. Typical usage is to filter
TARGET
with a list of values. For example,filter_by={"TARGET": [1, 2]}
filters the target column to only contains 1 or 2.- Type
dict[str, Any]
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Dataset [source]¶
- Parameters
dataset (
Dataset
) – the dataset to be filtered.- Returns
A filtered dataset.
- class utensil.loopflow.functions.SamplingRows(number: int = MISSING, ratio: float = MISSING, stratified: bool = False, replace: bool = False, random_seed: Union[None, int] = None, return_rest: bool = False)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Sampling rows of a dataset.
This method samples a dataset to a specific number of rows or to a ratio.
- number¶
Sampled dataset will have this many rows. Suppressed by ratio.
- Type
int
- ratio¶
Sampled dataset will have ratio * dataset.nrows rows. Suppressing number.
- Type
float, default 1.0 if number is not set
- stratified¶
If True, the dataset will be sampled using a stratified manner. That is, there will be same number of rows for each category of the dataset target, if possible.
- Type
bool, default False
- replace¶
If True, the dataset will be sampled with replacement and a row may be selected multiple times. Will raise an exception if replace is set to False and number larger than dataset.nrows or ratio larger than 1.
- Type
bool, default False
- random_seed¶
Random seed used to sample the dataset. It is used to set numpy.random.BitGenerator. See Numpy Documentation for more information.
- Type
None or int, default None
- return_rest¶
If False, only the sampled dataset is returned.
If True, this method will return a dictionary of two datasets,
{ 'sampled': sampled_dataset, 'rest': rest_dataset, }
rest_dataset contains all rows not in sampled_dataset.
Note
Even if sampled_dataset is sampled with replacement, rest_dataset does not contain duplicated rows.
- Type
bool, default False
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) Union[utensil.loopflow.functions.dataflow.Dataset, Dict[str, utensil.loopflow.functions.dataflow.Dataset]] [source]¶
- Parameters
dataset (
Dataset
) – the dataset to be sampled.- Returns
A sampled dataset or a dictionary of the sampled dataset and the rest dataset.
- class utensil.loopflow.functions.MakeDataset[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Make a dataset using target and features.
- class utensil.loopflow.functions.GetTarget[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Get target from a dataset.
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Target [source]¶
- Parameters
dataset (
Dataset
) – get target from this dataset.- Returns
The target of the dataset.
- class utensil.loopflow.functions.GetFeature(feature: str)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Get feature from a dataset with a given name.
- feature¶
This feature will be retrieved from the dataset.
- Type
str
- main(dataset: utensil.loopflow.functions.dataflow.Dataset) utensil.loopflow.functions.dataflow.Feature [source]¶
- Parameters
dataset (
Dataset
) – get feature from this dataset.- Returns
The feature with the given name of the dataset.
- class utensil.loopflow.functions.MergeFeatures[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Merge a list of feature to features.
- class utensil.loopflow.functions.LinearNormalize(upper: Optional[Dict[str, Any]] = None, lower: Optional[Dict[str, Any]] = None)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Perform linear normalization of a 1d array.
Linearly maps the given array from range
(u1, l1)
to(u2, l2)
.- upper¶
Sets
u1=upper["FROM"]
andu2=upper["TO"]
.u*
should be a number or a string,MAX
orMIN
.MAX
means the maximum of the array, andMIN
means the minimum of the array.- Type
dict of
FROM
andTO
, default bothMAX
- lower¶
Sets
l1=lower["FROM"]
andl2=lower["TO"]
.l*
should be a number or a string,MAX
orMIN
.MAX
means the maximum of the array, andMIN
means the minimum of the array.- Type
dict of
FROM
andTO
, default bothMIN
- class utensil.loopflow.functions.MakeModel(method)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Make an untrained model.
- method¶
the model will use this method to train. Options are
XGBOOST_REGRESSOR
,XGBOOST_CLASSIFIER
.- Type
str
- main(model_params: Dict[str, Any]) utensil.loopflow.functions.dataflow.Model [source]¶
- Parameters
model_params (dict) –
The parameters to create the model. Based on the method, different parameters can be set.
XGBOOST_REGRESSOR
:See more details in XGBoost documentation
learning_rate
max_depth
n_estimators
XGBOOST_CLASSIFIER
:See more details in XGBoost documentation
learning_rate
max_depth
n_estimators
SKLEARN_GRADIENT_BOOSTING_CLASSIFIER
:See more details in Scikit Learn documentation
learning_rate
max_depth
n_estimators
- Returns
An untrained
Model
.
- class utensil.loopflow.functions.Train[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Train a model.
- class utensil.loopflow.functions.Predict[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Predict a target.
- class utensil.loopflow.functions.ParameterSearch(init_state=0, seed: int = 0, search_map: Optional[Dict] = None)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Random search the model parameters.
See more in
utensil.random_search
.- init_state¶
- Type
int, default 0
- seed¶
- Type
int, default 0
- search_map¶
- Type
dict, default None
- class utensil.loopflow.functions.Score(dataset: str = MISSING, methods: Optional[Union[str, List[str]]] = None)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Calculate scores of a model, based on its prediction and a ground truth.
- dataset¶
The name of the dataset. It is used to generate an informative output.
- Type
str
- methods¶
The method or a list of methods to score a model. Options are
ACCURACY
.- Type
str or list of str
- main(prediction: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Features, utensil.loopflow.functions.dataflow.Dataset], ground_truth: Union[utensil.loopflow.functions.dataflow.Target, utensil.loopflow.functions.dataflow.Dataset], model: utensil.loopflow.functions.dataflow.Model)[source]¶
- Parameters
prediction (target, features or dataset) –
If prediction is
Target
, it will be directly used to calculate the score without using the model.If it is
Features
, model will make a prediction based on it.If it is
Dataset
, model will make a prediction based on its features.ground_truth (target or dataset) –
If ground_truth is a
Target
, it is directly compared to prediction.If ground_truth is a
Dataset
, its target is compared to prediction.model (
Model
) –The model to be scored.
Note
If prediction is
Target
, then the model is not used.
- Returns
A list of scoring results. A scoring result is consisted of two or three attributes, the scoring method name, the dataset name ( if provided), and the score.
For example:
# if dataset name is 'MNIST' [ ('ACCURACY', 'MNIST', 0.812641), ('FSCORE', 'MNIST', 0.713278), ] # if dataset name is not provided [ ('ACCURACY', 0.812641), ('FSCORE', 0.713278), ]
- class utensil.loopflow.functions.ChangeTypeTo(to_type: str)[source]¶
Bases:
utensil.loopflow.loopflow.NodeProcessFunction
Change the type of a given arr.
- to_type¶
The arr will be this type. Options are
INTEGER
,FLOAT
.- Type
str