A core of maf - an environment for computational experimentations on waf.
This module contains the core functionality of maf that handles parameterized tasks and metanodes.
Bases: object
Object representing one call of ExperimentContext.__call__().
List of parameters indicated by the taskgen call.
Bases: exceptions.Exception
Exception raised when experiment graph has a cycle.
Bases: waflib.Options.OptionsContext
ExperimentContext specific OptionContext.
Please extend the __init__ method below to add new options.
Bases: waflib.Build.BuildContext
Context class of waf experiment (a.k.a. maf).
Bases: object
Bipartite graph consisting of meta node and call object node.
Adds call object node, related meta nodes and edges.
Parameters: | call_object (CallObject) – Call object be added. |
---|
Runs topological sort on the experiment graph.
Returns: | List of call objects that topologically sorted. |
---|---|
Return type: | list of CallObject |
Bases: object
A wrapper of Node object used in ExperimentTasks for replacement of input/output Nodes.
The main motivation of this class is to make it easy to write unit-tests for user-defined rules. In maf, a user can define his own rule by writing a function that receives the task object as an argument, then reads (writes) an input (output) Node object by accessing like task.inputs[0].read. A user has to write a mock-object which mimics the behavior of Task object to test these functions, because the received task is generated by maf internally. This is tedious. ExperimentNode relieves this problem.
This Node wrapper behaves in two different ways: At an ordinary Task (the usual case), this is a mere wrapper of a Node object given in the constructor. The commonly used methods read, write, and abspath behave in the same ways as those of the ordinary Node object. At the test time, a user can get a dummy Node object using this class with no argument to the constructor. In that case, this class creates a temporary file and preserves internally. read and write methods are called to this temporary file, which saves some labors to define dummy Node objects for each rule. This class abstracts away the difference of these two cases.
Example usages of this class at test cases are found at, for example, tests/test_rule.py. See also test.TestTask().
Bases: waflib.Task.Task
A task class specific for ExperimentContext.
The purpose of this class is to bring the parameter as an attribute. The base class (waflib.Task.Task) doesn’t bring attributes except env, but the env must be a string-valued dictionary, which is problematic when we want to use the parameter in an object as it is. For example, a float value once converted to string lose some information.
Another motivation for this task is to control the hash value of a task: It is calculated based on the env, in which key is registered in vars or dep_vars. In __init__, this task registers necessary keys to dep_vars.
Parameter whose values are not stringized.
support pipe style rule str in default
Calculates the hash value of this task.
Overriden from waflib.Task.Task to use _node_sig to calculate the hash value of source/target files.
List of parameters each of which is the parameter of the corresponding input node.
Bases: maflib.core.ExperimentContext
outputs a graph of dependencies between tasks
Bases: object
A collection of meta nodes.
This class essentially is a hashtable preserving a collection of node ids sharing the same meta node signature. Meta node signature is calculated by GraphContext._extract_meta_node().
Bases: object
A collection of meta classes similar to MetaNodes.
Bases: object
Indexer assigning a unique id to each Node instance.
Because each Node instance has a unique absolute path, Node -> id mappings are managed with a dictionary of type dict(str, id) preserving correspondences between a path to an id.
See waflib.Context.Context.execute().
Bases: exceptions.Exception
Exception raised when arguments of ExperimentContext.__call__ is wrong.
Bases: maflib.core.ExperimentContext
Bases: dict
Parameter of maf task.
This is a dict with hash(). Be careful to use it with set(); parameter has hash(), but is mutable.
Checks whether the parameter conflicts with given other parameter.
Returns: | True if self conflicts with parameter, i.e. contains different values corresponding to same key. |
---|---|
Return type: | bool |
Gets dictionary with stringized values.
Returns: | A dictionary with same key and stringized values. |
---|---|
Return type: | dict of str key and str value |
Bases: object
Consistent generator of physical nodes identifier corresponding to their parameters.
Meta node has a path and its own parameters, each of which corresponds to one physical waf node named as ‘path/N’, where N is a unique name of the parameter. The correspondence between parameter and its name must be consistent over multiple execution of waf, so we serializes the table to hidden file.
This class also dumps the correspondence to a human-readable text file. The file is tab-separated line for each correspondence: the first element is an identifier and the second is a JSON representation of the correspondent parameter.
NOTE: On exception raised during task generation, save() must be called to avoid inconsistency on node names that had been generated before the exception was raised.
Gets the parameter of a given id.
Parameters: | parameter_id – Id of the parameter |
---|---|
Returns: | Parameter object of a given id. |
Return type: | Parameter |
Gets the id of given parameter.
Parameters: | parameter (Parameter) – Parameter object. |
---|---|
Returns: | Identifier of given parameter. The id may be generated in this method if necessary. |
Return type: | str |
Path to file that the table is serialized to.
Serializes the table to the file at self.path.
Path to file that the table is dumped to as a human-readable text.
Bases: object
A wrapper object of a rule function with associate values, which change is tracked on the experiment.
Parameters: |
|
---|
A task_gen method called before process_rule.
WARNING: This method currently strongly connected to the internal of process_rule method, which is defined in waflib.TaskGen, so may require a modification in future version of waf.
The role of this method is to create self.bld.cache_rule_attr, which is later used in process_rule. It is a dictionary of (task_name, the rule of task) pair to a task class. This task class is a derived class of ExperimentTask defined above, which override the run method of it with the function given by rule attribute written in wscript. This process is necessary because the process_rule cannot create a user- defined Task with a user-defined rule (as in our case).
In the current implementation of process_rule, the cache_rule_attr is used as follows;
try:
cache = self.bld.cache_rule_attr
except AttributeError:
cache = self.bld.cache_rule_attr = {}
cls = None
if getattr(self, 'cache_rule', 'True'):
try:
cls = cache[(name, self.rule)]
except KeyError:
pass
if not cls:
cls = Task.task_factory(name, self.rule,
....
This snippet search for a task from cache_rule_attr dictionary first, so we set that dictionary beforehand.
Result of experimentation collected through a meta node to plot.
Result of experiments is represented by a meta node consisted by a set of physical nodes each of which contains a dictionary or an array of dictionaries. This class is used to collect all dictionaries through the meta node and to extract point sequences to plot.
Extracts a sequence of one-dimensional data points.
This function extracts x coordinate of each result value and creates a list of them. If sort == True, then the list is sorted. User can extract different sequences for varying values corresponding to given key(s).
Parameters: |
|
---|---|
Returns: | If key is None, then it returns a list of x values. Otherwise, it returns a dictionary from key(s) to a sequence of x values. Each sequence consists of values matched to the key(s). |
Return type: | dict or list |
Extracts a sequence of two-dimensional data points.
See get_data_1d for detail. Difference from get_data_2d is that the values are represented by pairs.
Parameters: |
|
---|---|
Returns: | If key is None, then it returns a pair of x value sequence and y value sequence. Otherwise, it returns a dictionary from a key to a pair of x value sequence and y value sequence. Each sequence consists of values matched to the key(s). |
Return type: | dict or tuple of two list s |
Extracts a sequence of three-dimensional data points.
See get_data_1d for detail. Difference from get_data_3d is that the values are represented by triples.
Parameters: |
|
---|---|
Returns: | If key is None, then it returns a triple of x value sequence, y value sequence and z value sequence. Otherwise, it returns a dictionary from a key to a triple of x value sequence, y value sequence and z value sequence. Each sequence consists of values matched to the key(s). |
Return type: | dict or tuple of three list s. |
Creates an aggregator to plot data using matplotlib and PlotData.
Parameters: | callback_body (function or callable object, whose signature is (matplotlib.figure.Figure, PlotData).) – Callable object or function that plots data. It takes three parameters: matplotlib.figure.Figure object, maflib.plot.PlotData object and a parameter of class maflib.core.Parameter. User must define a callback function that plots given data to given figure. |
---|
Creates an aggregator that draw a line plot.
Aggregator that calculates the average value for each key.
The result contains all keys that some inputs contain. Each value is an average value of the corresponding key through all the inputs. If there is a value that cannot be passed to float(), it omits the corresponding key from the result.
Calculates various performance measures for multi-class classification.
The source of this task is assumed to be a json array each item of which is a dictionary of the form {"p": 3, "c": 5} where "p" indicates the predict label, while “c” indicates the correct label. If you use libsvm, create_label_result_libsvm converts the results to this format.
The output measures is summarized as follows, most of which are cited from (*):
Accuracy, AverageAccuray, ErrorRate
are calculated for each label.
In terms of Precision, Recall and F1, averaged results are also calculated. There are two different types of averaging: micro and macro. Micro average is calculated using global counts of true positive, false positive, etc, while macro average is calculated naively by dividing the number of labels.
The output of this task is one json file, like
{
"accuracy": 0.7,
"average_accuracy": 0.8,
"error_rate": 0.12,
"1-precision": 0.5,
"1-recall": 0.8,
"1-F1": 0.6,
"1-specifity": 0.6,
"1-AUC": 0.7,
"precision-micro":0.7
"precision-macro":0.6
...
"2-precision": 0.6,
"2-recall": 0.7,
...
}
where accuracy, average_accuracy and error_rate corresponds to Accuracy, AverageAccuracy and ErrorRate respectively. Average is the macro average of all data, which is consistent with the output of e.g., svm-predict. Other results (e.g. 1-precision) are calculated for each label and represented as a pair of “label” and “measure” combined with a hyphen. For example, 1-precision is the precision for the label 1, while 3-F1 is F1 for the label 3.
(*) Marina Sokolova, Guy Lapalme A systematic analysis of performance measures for classification tasks Information Processing and Management 45 (2009) 427-437
Rule that converts message output by svm-predict into json file.
This rule can be used to parse the output messsage of svm-predict command of LIBSVM, which contains an accuracy of prediction. The output is formatted like {"accuracy": <value>}.
Parameters: | task (waflib.Task.Task) – waf task. |
---|
TODO(noji) write document.
A rule to decompress an input file.
Parameters: | filetype (str) – Type of compressed file. Following values are available.
|
---|---|
Returns: | A rule. |
Return type: | maflib.core.Rule |
Create a rule to download a file from given URL.
It stores the file to the target node. If decompress_as is given, then it automatically decompresses the downloaded file.
Parameters: |
|
---|---|
Returns: | A rule. |
Return type: |
Creates an aggregator to select the max value of given key.
The created aggregator chooses the result with the maximum value of key, and writes the JSON object to the output node.
Parameters: | key (str) – A key to be used for selection of maximum value. |
---|---|
Returns: | An aggregator. |
Return type: | maflib.core.Rule |
Creates an aggregator to select the minimum value of given key.
The created aggregator chooses the result with the minimum value of key, and writes the JSON object to the output node.
Parameters: | key (str) – A key to be used for selection of minimum value. |
---|---|
Returns: | An aggregator. |
Return type: | maflib.core.Rule |
Creates a rule that splits a line-by-line dataset to the k-th fold train and validation subsets for n-fold cross validation.
Assume the input dataset is a text file where each sample is written in a distinct line. This task splits this dataset to given number of folds, extracts the n-th fold as a validation set (where n is specified by the parameter of given key), the others as a training set, and then writes these subsets to output nodes. This is a usual workflow of cross validation in machine learning.
Note that this task does not shuffle the input dataset. If the order causes imbalancy of each fold, then user should add a task for shuffling the dataset before this task.
This task requires a parameter indicating an index of the fold. The parameter name is specified by parameter_name. The index must be a non-negative integer less than num_folds.
Parameters: |
|
---|---|
Returns: | A rule. |
Return type: | function |
Segments an example per line data into k-fold where k is the length of param weights.
This method consider the label-bias when segmentation: In machine learning experiments, we often want to prepare training or testing examples in equal proportions for each label for the correct evaluation. weights specifies the proportion of examples in the k-th fold for each label.
A typical usage of this task is as follows:
exp(source='news20.scale',
target='train dev test',
rule=segment_without_label_bias([0.8, 0.1, 0.1]))
This exp segment data news20.scale into 3-fold for train/develop/test. For each label, train contains 80% of the examples of that label, while dev/test contains 10% of examples of the one.
The input is assumed to be the format of an example per line, such as libsvm or vowpal format. The param extract_label specifies the way to extract the label from each line, so you can handle other format by customizing this function as far as it follows the one example per line format.
Parameters: |
|
---|
Bases: waflib.Context.Context
A context class for executing unittests of maf.
Adds executing tests.
Parameters: | tests_list – Tests to add, specified in the following way: |
---|
See waflib.Context.Context.execute()
Bases: object
A task object making it easy to write unittest for rules.
This class mimics the behavior of task object by having dummy Node objects internally. These node objects are maflib.core.ExperimentNode().
Example usages of this task can be found on test_rules.py.
inputs and outputs are instances of ExperimentNodeList. This class makes easy for accessing input/output node objects by automatically adding new element if necessary. NOTE: You should not add elements to this list manually, e.g., with task.outputs.append(...). Please use instead setsize(size) or index accessing like task.outputs[3] automatically appends elements up to the index 2.
A ConfigSet to store any attributes.
ConfigSet is a class defined by waflib which is used as a dictionary to store any attributes. Its values can be accessed both by attributes or by keys;
task = TestTask()
task.env.FOO = 'test'
task.env['FOO'] # => 'test'
Creates an aggregator using function callback_body independent from waf.
This function creates a wrapper of given callback function that behaves as a rule of an aggregation task. It supposes that input files are represented by JSON files each of which is a flat JSON object (i.e. an object that does not contain any objects) or a JSON array of flat objects. The created rule first combines these JSON objects into an array of Python dictionaries, and then passes it to the user-defined callback body.
There are two ways to write the result to the output node. First is to let callback_body return the content string to be written to the output node; then the rule automatically writes it to the output node. Second is to let callback_body write it using its second argument (called abspath), which is the absolute path to the output node. In this case, callback_body MUST return None to suppress the automatic writing.
This function is often used as a decorator. See maflib.rules or maflib.plot to get examples of callback_body.
Parameters: | callback_body (function or callble object of signature (list, str).) – A function or a callable object that takes three arguments: values, abspath, and parameter. values is an array of dictionaries that represents the content of input files. abspath is an absolute path to the output node. parameter is the parameter of the output node, i.e. the parameter of this task. This function should return str or None. |
---|---|
Returns: | An aggregator function that calls callback_body. |
Return type: | function |
Create an aggregator specific to output the aggregated result into json.
Result of aggregator task is often json-formatted for later tasks, such as py:mod:maflib.rules.max and py:mod:maflib.rules.average. In py:mod:maflib.rules.max, for example, the parameter setting corresponding to the max is necessary in future task, so the parameter must also be dumped to json-format. However, this is problematic when parameter is not json-serializable, e.g., an object of user-defined class. To avoid this problem, this aggregator decorator first converts parameter to json-serializable one by converting not json-serializable values of parameter (dict type) into string. All json-serializable values remain the same, e.g., int values are not converted to string.
Parameters: | callback_body (function or callable object of signature (list, str, parameter)) – A function or a callable object that takes the same arguments as that of aggregator, but return an object, which is going to be serialized to json. See maflib.rules.max for example. |
---|---|
Returns: | An aggregator. |
Return type: | function |
Generates a direct product of given listed parameters.
Here is an example.
maflib.util.product({'x': [0, 1, 2], 'y': [1, 3, 5]})
# => [{'x': 0, 'y': 1}, {'x': 0, 'y': 3}, {'x': 0, 'y': 5},
# {'x': 1, 'y': 1}, {'x': 1, 'y': 3}, {'x': 1, 'y': 5},
# {'x': 2, 'y': 1}, {'x': 2, 'y': 3}, {'x': 2, 'y': 5}]
# (the order of parameters may be different)
Parameters: | parameter (dict from str to list.) – A dictionary that represents a set of parameters. Its values are lists of values to be enumerated. |
---|---|
Returns: | A direct product of a set of parameters. |
Return type: | list of dict. |
Decorator to define a rule function that takes parameters and arguments interchangeably.
When one defines a rule that takes some parameters or arguments, he/she is recommended to use rule() decorator to define it. The main reason is that it should be decided by users of the rule whether an argument is contained into task parameter or not, since it is decided by the design of his/her experiment, not by the design of the rule.
usage:
@maflib.util.rule
def my_rule(task):
do something with parameters 'a' and 'b'
def build(exp):
# indicate by argument
exp(target='t', rule=my_rule(a=1, b=2))
# indicate by parameter
exp(target='s', parameters=[{'a': 1, 'b': 2}], rule=my_rule())
# mixed usage
exp(target='u', parameters=[{'a': 1}], rule=my_rule(b=2))
# if no arguments are used, parens can be omitted
exp(target='r', parameters=[{'a': 1, 'b': 2}], rule=my_rule)
Parameters: | callback_body (function) – A function that receives a task instance and does its own work. It is almost same as a usual task function; the only different thing is that parameter of the given task is expanded by the arguments as the above example. |
---|---|
Returns: | A task generator function. |
Return type: | function |
Randomly samples parameters from given distributions.
This function samples parameter combinations each of which is a dictionary from key to value sampled from a distribution corresponding to the key. It is useful for hyper-parameter optimization compared to using product, since every instance can be different on all dimensions for each other.
Parameters: |
|
---|---|
Returns: | A list of sampled parameters. |
Return type: | list of dict. |