Table Of Contents

Previous topic

mafの使い方

This Page

maflib Package

core Module

A core of maf - an environment for computational experimentations on waf.

This module contains the core functionality of maf that handles parameterized tasks and metanodes.

class maflib.core.CallObject(**kw)

Bases: object

Object representing one call of ExperimentContext.__call__().

parameters = None

List of parameters indicated by the taskgen call.

exception maflib.core.CyclicDependencyException

Bases: exceptions.Exception

Exception raised when experiment graph has a cycle.

class maflib.core.ExpOptionsContext(**kw)

Bases: waflib.Options.OptionsContext

ExperimentContext specific OptionContext.

Please extend the __init__ method below to add new options.

class maflib.core.ExperimentContext(**kw)

Bases: waflib.Build.BuildContext

Context class of waf experiment (a.k.a. maf).

class maflib.core.ExperimentGraph

Bases: object

Bipartite graph consisting of meta node and call object node.

add_call_object(call_object)

Adds call object node, related meta nodes and edges.

Parameters:call_object (CallObject) – Call object be added.
get_sorted_call_objects()

Runs topological sort on the experiment graph.

Returns:List of call objects that topologically sorted.
Return type:list of CallObject
class maflib.core.ExperimentNode(waflib_node=None)

Bases: object

A wrapper of Node object used in ExperimentTasks for replacement of input/output Nodes.

The main motivation of this class is to make it easy to write unit-tests for user-defined rules. In maf, a user can define his own rule by writing a function that receives the task object as an argument, then reads (writes) an input (output) Node object by accessing like task.inputs[0].read. A user has to write a mock-object which mimics the behavior of Task object to test these functions, because the received task is generated by maf internally. This is tedious. ExperimentNode relieves this problem.

This Node wrapper behaves in two different ways: At an ordinary Task (the usual case), this is a mere wrapper of a Node object given in the constructor. The commonly used methods read, write, and abspath behave in the same ways as those of the ordinary Node object. At the test time, a user can get a dummy Node object using this class with no argument to the constructor. In that case, this class creates a temporary file and preserves internally. read and write methods are called to this temporary file, which saves some labors to define dummy Node objects for each rule. This class abstracts away the difference of these two cases.

Example usages of this class at test cases are found at, for example, tests/test_rule.py. See also test.TestTask().

abspath()
read()
write(s)
class maflib.core.ExperimentTask(env, generator)

Bases: waflib.Task.Task

A task class specific for ExperimentContext.

The purpose of this class is to bring the parameter as an attribute. The base class (waflib.Task.Task) doesn’t bring attributes except env, but the env must be a string-valued dictionary, which is problematic when we want to use the parameter in an object as it is. For example, a float value once converted to string lose some information.

Another motivation for this task is to control the hash value of a task: It is calculated based on the env, in which key is registered in vars or dep_vars. In __init__, this task registers necessary keys to dep_vars.

hcode = '\tdef run(self):\n\t\tbld=self.generator.bld\n\t\tif bld.cache_global and not bld.nocache:\n\t\t\tif self.can_retrieve_cache():\n\t\t\t\treturn 0\n\t\treturn m1(self)\n'
parameter = None

Parameter whose values are not stringized.

post_run()
run()
shell = True

support pipe style rule str in default

sig_explicit_deps()

Calculates the hash value of this task.

Overriden from waflib.Task.Task to use _node_sig to calculate the hash value of source/target files.

source_parameters = None

List of parameters each of which is the parameter of the corresponding input node.

class maflib.core.GraphContext(**kw)

Bases: maflib.core.ExperimentContext

outputs a graph of dependencies between tasks

class MetaNodes(unique_nodes)

Bases: object

A collection of meta nodes.

This class essentially is a hashtable preserving a collection of node ids sharing the same meta node signature. Meta node signature is calculated by GraphContext._extract_meta_node().

add_node(node, id)
render_graphviz(node_indexer, ctx)
class GraphContext.MetaTasks(tasks)

Bases: object

A collection of meta classes similar to MetaNodes.

add_task(task, id)
max_num_invis = 10
num_invis_around_task = 3
num_invis_per_nodes = 3
render_graphviz()
render_invisibles(node_indexer)
class GraphContext.NodeIndexer

Bases: object

Indexer assigning a unique id to each Node instance.

Because each Node instance has a unique absolute path, Node -> id mappings are managed with a dictionary of type dict(str, id) preserving correspondences between a path to an id.

get(node_id)
get_id(node)
GraphContext.cmd = 'graph'
GraphContext.execute()

See waflib.Context.Context.execute().

GraphContext.node_label(node)
exception maflib.core.InvalidMafArgumentException

Bases: exceptions.Exception

Exception raised when arguments of ExperimentContext.__call__ is wrong.

class maflib.core.OldExperimentContext(**kw)

Bases: maflib.core.ExperimentContext

cmd = 'experiment'
fun = 'experiment'
variant = 'experiment'
class maflib.core.Parameter

Bases: dict

Parameter of maf task.

This is a dict with hash(). Be careful to use it with set(); parameter has hash(), but is mutable.

conflict_with(parameter)

Checks whether the parameter conflicts with given other parameter.

Returns:True if self conflicts with parameter, i.e. contains different values corresponding to same key.
Return type:bool
to_str_valued_dict()

Gets dictionary with stringized values.

Returns:A dictionary with same key and stringized values.
Return type:dict of str key and str value
class maflib.core.ParameterIdGenerator(path, text_path)

Bases: object

Consistent generator of physical nodes identifier corresponding to their parameters.

Meta node has a path and its own parameters, each of which corresponds to one physical waf node named as ‘path/N’, where N is a unique name of the parameter. The correspondence between parameter and its name must be consistent over multiple execution of waf, so we serializes the table to hidden file.

This class also dumps the correspondence to a human-readable text file. The file is tab-separated line for each correspondence: the first element is an identifier and the second is a JSON representation of the correspondent parameter.

NOTE: On exception raised during task generation, save() must be called to avoid inconsistency on node names that had been generated before the exception was raised.

get(parameter_id)

Gets the parameter of a given id.

Parameters:parameter_id – Id of the parameter
Returns:Parameter object of a given id.
Return type:Parameter
get_id(parameter)

Gets the id of given parameter.

Parameters:parameter (Parameter) – Parameter object.
Returns:Identifier of given parameter. The id may be generated in this method if necessary.
Return type:str
path = None

Path to file that the table is serialized to.

save()

Serializes the table to the file at self.path.

text_path = None

Path to file that the table is dumped to as a human-readable text.

class maflib.core.Rule(fun, dependson=[])

Bases: object

A wrapper object of a rule function with associate values, which change is tracked on the experiment.

Parameters:
  • fun – target function of the task.
  • dependson – list of variable or function, which one wants to track. All these variables are later converted to string values, so if one wants to pass the variable of user-defined class, that class must provide meaningful __str__ method.
add_dependson(dependson)
stred_dependson()
maflib.core.configure(conf)
maflib.core.options(opt)
maflib.core.register_experiment_task_with_rule(self)

A task_gen method called before process_rule.

WARNING: This method currently strongly connected to the internal of process_rule method, which is defined in waflib.TaskGen, so may require a modification in future version of waf.

The role of this method is to create self.bld.cache_rule_attr, which is later used in process_rule. It is a dictionary of (task_name, the rule of task) pair to a task class. This task class is a derived class of ExperimentTask defined above, which override the run method of it with the function given by rule attribute written in wscript. This process is necessary because the process_rule cannot create a user- defined Task with a user-defined rule (as in our case).

In the current implementation of process_rule, the cache_rule_attr is used as follows;

try:
    cache = self.bld.cache_rule_attr
except AttributeError:
    cache = self.bld.cache_rule_attr = {}

cls = None
if getattr(self, 'cache_rule', 'True'):
    try:
        cls = cache[(name, self.rule)]
    except KeyError:
        pass
if not cls:
    cls = Task.task_factory(name, self.rule,
    ....

This snippet search for a task from cache_rule_attr dictionary first, so we set that dictionary beforehand.

plot Module

class maflib.plot.PlotData(inputs)

Result of experimentation collected through a meta node to plot.

Result of experiments is represented by a meta node consisted by a set of physical nodes each of which contains a dictionary or an array of dictionaries. This class is used to collect all dictionaries through the meta node and to extract point sequences to plot.

get_data_1d(x, key=None, sort=True)

Extracts a sequence of one-dimensional data points.

This function extracts x coordinate of each result value and creates a list of them. If sort == True, then the list is sorted. User can extract different sequences for varying values corresponding to given key(s).

Parameters:
  • x (str) – A key string corresponding to x coordinate.
  • key (None, str or tuple of strings) – Key strings that define distinct sequences of data points. It can be either of None, a string value or a tuple of string values.
  • sort (bool) – Flag for sorting the sequence(s).
Returns:

If key is None, then it returns a list of x values. Otherwise, it returns a dictionary from key(s) to a sequence of x values. Each sequence consists of values matched to the key(s).

Return type:

dict or list

get_data_2d(x, y, key=None, sort=True)

Extracts a sequence of two-dimensional data points.

See get_data_1d for detail. Difference from get_data_2d is that the values are represented by pairs.

Parameters:
  • x (str) – A key string corresponding to x (first) coordinate.
  • y (str) – A key string corresponding to y (second) coordinate.
  • key (None, str or tuple of strings) – Key strings that define distinct sequences of data points. It can be either of None, a string value or a tuple of string values.
  • sort (bool) – Flag for sorting the sequence(s).
Returns:

If key is None, then it returns a pair of x value sequence and y value sequence. Otherwise, it returns a dictionary from a key to a pair of x value sequence and y value sequence. Each sequence consists of values matched to the key(s).

Return type:

dict or tuple of two list s

get_data_3d(x, y, z, key=None, sort=True)

Extracts a sequence of three-dimensional data points.

See get_data_1d for detail. Difference from get_data_3d is that the values are represented by triples.

Parameters:
  • x (str) – A key string corresponding to x (first) coordinate.
  • y (str) – A key string corresponding to y (second) coordinate.
  • z (str) – A key string corresponding to z (third) coordinate.
  • key (None, str or tuple of strings) – Key strings that define distinct sequences of data points. It can be either of None, a string value or a tuple of string values.
  • sort (bool) – Flag for sorting the sequence(s).
Returns:

If key is None, then it returns a triple of x value sequence, y value sequence and z value sequence. Otherwise, it returns a dictionary from a key to a triple of x value sequence, y value sequence and z value sequence. Each sequence consists of values matched to the key(s).

Return type:

dict or tuple of three list s.

maflib.plot.plot_by(callback_body)

Creates an aggregator to plot data using matplotlib and PlotData.

Parameters:callback_body (function or callable object, whose signature is (matplotlib.figure.Figure, PlotData).) – Callable object or function that plots data. It takes three parameters: matplotlib.figure.Figure object, maflib.plot.PlotData object and a parameter of class maflib.core.Parameter. User must define a callback function that plots given data to given figure.
maflib.plot.plot_line(x, y, legend=None)

Creates an aggregator that draw a line plot.

rules Module

maflib.rules.average(*args, **kargs)

Aggregator that calculates the average value for each key.

The result contains all keys that some inputs contain. Each value is an average value of the corresponding key through all the inputs. If there is a value that cannot be passed to float(), it omits the corresponding key from the result.

maflib.rules.calculate_stats_multiclass_classification(task)

Calculates various performance measures for multi-class classification.

The source of this task is assumed to be a json array each item of which is a dictionary of the form {"p": 3, "c": 5} where "p" indicates the predict label, while “c” indicates the correct label. If you use libsvm, create_label_result_libsvm converts the results to this format.

The output measures is summarized as follows, most of which are cited from (*):

Accuracy, AverageAccuray, ErrorRate

Other measures:
Precision, Recall, F1, Specifity and AUC

are calculated for each label.

In terms of Precision, Recall and F1, averaged results are also calculated. There are two different types of averaging: micro and macro. Micro average is calculated using global counts of true positive, false positive, etc, while macro average is calculated naively by dividing the number of labels.

The output of this task is one json file, like

{
  "accuracy": 0.7,
  "average_accuracy": 0.8,
  "error_rate": 0.12,
  "1-precision": 0.5,
  "1-recall": 0.8,
  "1-F1": 0.6,
  "1-specifity": 0.6,
  "1-AUC": 0.7,
  "precision-micro":0.7
  "precision-macro":0.6
  ...
  "2-precision": 0.6,
  "2-recall": 0.7,
  ...
}

where accuracy, average_accuracy and error_rate corresponds to Accuracy, AverageAccuracy and ErrorRate respectively. Average is the macro average of all data, which is consistent with the output of e.g., svm-predict. Other results (e.g. 1-precision) are calculated for each label and represented as a pair of “label” and “measure” combined with a hyphen. For example, 1-precision is the precision for the label 1, while 3-F1 is F1 for the label 3.

(*) Marina Sokolova, Guy Lapalme A systematic analysis of performance measures for classification tasks Information Processing and Management 45 (2009) 427-437

maflib.rules.convert_libsvm_accuracy(task)

Rule that converts message output by svm-predict into json file.

This rule can be used to parse the output messsage of svm-predict command of LIBSVM, which contains an accuracy of prediction. The output is formatted like {"accuracy": <value>}.

Parameters:task (waflib.Task.Task) – waf task.
maflib.rules.create_label_result_libsvm(task)

TODO(noji) write document.

maflib.rules.decompress(filetype='auto')

A rule to decompress an input file.

Parameters:filetype (str) –

Type of compressed file. Following values are available.

  • 'auto': Use automatically detected type from the extension of the input file name.
  • 'bz2': bzip2 file.
  • 'gz': gzip file.
  • 'zip': zip file.
Returns:A rule.
Return type:maflib.core.Rule
maflib.rules.download(url, decompress_as='')

Create a rule to download a file from given URL.

It stores the file to the target node. If decompress_as is given, then it automatically decompresses the downloaded file.

Parameters:
  • url (str) – URL string of the file to be downloaded.
  • decompress_as – Decompression method of downloaded file. If an empty string is given, then this function does not do decompression. 'bz2', 'gz' or 'zip' is available.
Returns:

A rule.

Return type:

maflib.core.Rule

maflib.rules.max(key)

Creates an aggregator to select the max value of given key.

The created aggregator chooses the result with the maximum value of key, and writes the JSON object to the output node.

Parameters:key (str) – A key to be used for selection of maximum value.
Returns:An aggregator.
Return type:maflib.core.Rule
maflib.rules.min(key)

Creates an aggregator to select the minimum value of given key.

The created aggregator chooses the result with the minimum value of key, and writes the JSON object to the output node.

Parameters:key (str) – A key to be used for selection of minimum value.
Returns:An aggregator.
Return type:maflib.core.Rule
maflib.rules.segment_by_line(num_folds, parameter_name='fold')

Creates a rule that splits a line-by-line dataset to the k-th fold train and validation subsets for n-fold cross validation.

Assume the input dataset is a text file where each sample is written in a distinct line. This task splits this dataset to given number of folds, extracts the n-th fold as a validation set (where n is specified by the parameter of given key), the others as a training set, and then writes these subsets to output nodes. This is a usual workflow of cross validation in machine learning.

Note that this task does not shuffle the input dataset. If the order causes imbalancy of each fold, then user should add a task for shuffling the dataset before this task.

This task requires a parameter indicating an index of the fold. The parameter name is specified by parameter_name. The index must be a non-negative integer less than num_folds.

Parameters:
  • num_folds – Number of folds for splitting. Inverse of this value is the ratio of validation set size compared to the input dataset size. As noted above, the fold parameter must be less than num_folds.
  • parameter_name – Name of the parameter indicating the number of folds.
Returns:

A rule.

Return type:

function

maflib.rules.segment_without_label_bias(weights, extract_label=<function <lambda> at 0x2ba1e4b68758>)

Segments an example per line data into k-fold where k is the length of param weights.

This method consider the label-bias when segmentation: In machine learning experiments, we often want to prepare training or testing examples in equal proportions for each label for the correct evaluation. weights specifies the proportion of examples in the k-th fold for each label.

A typical usage of this task is as follows:

exp(source='news20.scale',
    target='train dev test',
    rule=segment_without_label_bias([0.8, 0.1, 0.1]))

This exp segment data news20.scale into 3-fold for train/develop/test. For each label, train contains 80% of the examples of that label, while dev/test contains 10% of examples of the one.

The input is assumed to be the format of an example per line, such as libsvm or vowpal format. The param extract_label specifies the way to extract the label from each line, so you can handle other format by customizing this function as far as it follows the one example per line format.

Parameters:
  • weights – list of floats specifing the weight by which data are segmented
  • extract_label – function extracting the label from an input line

test Module

class maflib.test.ExpTestContext(**kw)

Bases: waflib.Context.Context

A context class for executing unittests of maf.

add(tests_list)

Adds executing tests.

Parameters:tests_list – Tests to add, specified in the following way:
  • file name (ends with .py): find all test classes in that file
  • directory name: find all test classes in files matching ‘test*.py’ in the directory
  • class name: add tests defined in the class
add_test_in_class(cls)
add_test_in_dir(dir_path)
add_test_in_path(test_path)
cmd = 'exptest'
execute()

See waflib.Context.Context.execute()

fun = 'exptest'
unique_(l)
class maflib.test.TestTask

Bases: object

A task object making it easy to write unittest for rules.

This class mimics the behavior of task object by having dummy Node objects internally. These node objects are maflib.core.ExperimentNode().

Example usages of this task can be found on test_rules.py.

inputs and outputs are instances of ExperimentNodeList. This class makes easy for accessing input/output node objects by automatically adding new element if necessary. NOTE: You should not add elements to this list manually, e.g., with task.outputs.append(...). Please use instead setsize(size) or index accessing like task.outputs[3] automatically appends elements up to the index 2.

class ExperimentNodeList

Bases: list

setsize(size)
TestTask.env = None

A ConfigSet to store any attributes.

ConfigSet is a class defined by waflib which is used as a dictionary to store any attributes. Its values can be accessed both by attributes or by keys;

task = TestTask()
task.env.FOO = 'test'
task.env['FOO'] # => 'test'
TestTask.json_output(index)
TestTask.set_input(index, s)
TestTask.set_input_by_json(index, obj)

util Module

maflib.util.aggregator(callback_body)

Creates an aggregator using function callback_body independent from waf.

This function creates a wrapper of given callback function that behaves as a rule of an aggregation task. It supposes that input files are represented by JSON files each of which is a flat JSON object (i.e. an object that does not contain any objects) or a JSON array of flat objects. The created rule first combines these JSON objects into an array of Python dictionaries, and then passes it to the user-defined callback body.

There are two ways to write the result to the output node. First is to let callback_body return the content string to be written to the output node; then the rule automatically writes it to the output node. Second is to let callback_body write it using its second argument (called abspath), which is the absolute path to the output node. In this case, callback_body MUST return None to suppress the automatic writing.

This function is often used as a decorator. See maflib.rules or maflib.plot to get examples of callback_body.

Parameters:callback_body (function or callble object of signature (list, str).) – A function or a callable object that takes three arguments: values, abspath, and parameter. values is an array of dictionaries that represents the content of input files. abspath is an absolute path to the output node. parameter is the parameter of the output node, i.e. the parameter of this task. This function should return str or None.
Returns:An aggregator function that calls callback_body.
Return type:function
maflib.util.json_aggregator(callback_body)

Create an aggregator specific to output the aggregated result into json.

Result of aggregator task is often json-formatted for later tasks, such as py:mod:maflib.rules.max and py:mod:maflib.rules.average. In py:mod:maflib.rules.max, for example, the parameter setting corresponding to the max is necessary in future task, so the parameter must also be dumped to json-format. However, this is problematic when parameter is not json-serializable, e.g., an object of user-defined class. To avoid this problem, this aggregator decorator first converts parameter to json-serializable one by converting not json-serializable values of parameter (dict type) into string. All json-serializable values remain the same, e.g., int values are not converted to string.

Parameters:callback_body (function or callable object of signature (list, str, parameter)) – A function or a callable object that takes the same arguments as that of aggregator, but return an object, which is going to be serialized to json. See maflib.rules.max for example.
Returns:An aggregator.
Return type:function
maflib.util.product(parameter)

Generates a direct product of given listed parameters.

Here is an example.

maflib.util.product({'x': [0, 1, 2], 'y': [1, 3, 5]})
# => [{'x': 0, 'y': 1}, {'x': 0, 'y': 3}, {'x': 0, 'y': 5},
#     {'x': 1, 'y': 1}, {'x': 1, 'y': 3}, {'x': 1, 'y': 5},
#     {'x': 2, 'y': 1}, {'x': 2, 'y': 3}, {'x': 2, 'y': 5}]
# (the order of parameters may be different)
Parameters:parameter (dict from str to list.) – A dictionary that represents a set of parameters. Its values are lists of values to be enumerated.
Returns:A direct product of a set of parameters.
Return type:list of dict.
maflib.util.rule(callback_body)

Decorator to define a rule function that takes parameters and arguments interchangeably.

When one defines a rule that takes some parameters or arguments, he/she is recommended to use rule() decorator to define it. The main reason is that it should be decided by users of the rule whether an argument is contained into task parameter or not, since it is decided by the design of his/her experiment, not by the design of the rule.

usage:

@maflib.util.rule
def my_rule(task):
    do something with parameters 'a' and 'b'

def build(exp):
    # indicate by argument
    exp(target='t', rule=my_rule(a=1, b=2))

    # indicate by parameter
    exp(target='s', parameters=[{'a': 1, 'b': 2}], rule=my_rule())

    # mixed usage
    exp(target='u', parameters=[{'a': 1}], rule=my_rule(b=2))

    # if no arguments are used, parens can be omitted
    exp(target='r', parameters=[{'a': 1, 'b': 2}], rule=my_rule)
Parameters:callback_body (function) – A function that receives a task instance and does its own work. It is almost same as a usual task function; the only different thing is that parameter of the given task is expanded by the arguments as the above example.
Returns:A task generator function.
Return type:function
maflib.util.sample(num_samples, distribution)

Randomly samples parameters from given distributions.

This function samples parameter combinations each of which is a dictionary from key to value sampled from a distribution corresponding to the key. It is useful for hyper-parameter optimization compared to using product, since every instance can be different on all dimensions for each other.

Parameters:
  • num_samples (int) – Number of samples. Resulting meta node contains this number of physical nodes for each input parameter set.
  • distribution

    Dictionary from parameter names to values specifying distributions to sample from. Acceptable values are following:

    Pair of numbers
    (a, b) specifies a uniform distribution on the continuous interval [a, b).
    List of values
    This specifies a uniform distribution on the descrete set of values.
    Callable object or function
    f can be used for an arbitrary generator of values. Multiple calls of f() should generate random samples of user-defined distribution.
Returns:

A list of sampled parameters.

Return type:

list of dict.

maflib.util.set_random_seed(x)