Invariant Inference & Representation

traincheck-infer is part of the inference stage of the TrainCheck workflow. It consumes trace files collected from correct training runs and infers behavioral invariants that describe expected runtime behavior. These invariants are later used by traincheck-check to detect violations in other training pipelines.

📚 Table of Contents

🔧 Basic Usage

In most cases, you only need to specify one or more folders (generated by traincheck-collect) containing trace files using the -f or --trace-folders flag:

traincheck-infer -f ./traincheck_mnist_trace ./traincheck_84911_trace ..

You can provide multiple folders to aggregate traces from different correct runs or programs. This helps TrainCheck generalize better and avoid overfitting to any single pipeline, reducing false positives during checking—especially when the inferred invariants are applied to unrelated or structurally different pipelines.

This command will infer invariants from all trace folders provided, and output invariants into invariants.json.

⚙️ Advanced Usage

traincheck-infer provides additional flags for customization and debugging. Some concepts such as "relation" will be explained later.

  1. -o, --output: Specify a custom file name for the invariants.
  2. --disable-relation / --enable-relation: Control which types of invariants to infer. This is useful for reducing noise or targeting specific checks. ```bash # Disable ordering-based invariants traincheck-infer -f ./traces --disable-relation FunctionLeadRelation FunctionCoverRelation

    Enable only contain and variable consistency invariants

    traincheck-infer -f ./traces --enable-relation APIContainRelation ConsistencyRelation ```

    See traincheck.invariant.relation_pool for a complete list of invariants. 3. -b, --backend: Select the data processing engine for trace handling. - pandas (default): stable and well-tested. - polars: faster for large traces (experimental) - dict: pure Python dictionary backend (experimental)

Other flags (e.g. --debug, -t --traces) are available via traincheck-infer --help, but are rarely needed unless you are debugging or developing TrainCheck itself.

📘 Invariant Concepts

TrainCheck infers invariants — logical properties that are consistently held during correct training runs. These invariants are used to define the expected behavior of a training pipeline, and later help detect silent issues when applied to other runs.

Each invariant describes a specific pattern of behavior observed in the trace, such as: - Attribute changes during a function call (e.g., .grad becomes None in zero_grad()) - Ordering relationships between API calls (e.g., zero_grad() should occur before step()) - Consistency among values across different parameters (e.g., shared parameters should have the same value across devices during distributed training)

Invariant Representation

An invariant is defined by three things: 1. relation: the relationship this invariant encodes, can be viewed as an invariant template. Each relation has a separate inference algorithm defined (e.g., ConsistencyRelation.infer) 2. params: descriptors for entities that should obey the relationship. 3. precondition: a logical predicate defining the context when an invariant can be applied.

In the actual json representation of invariants in the traincheck-infer output, an invariant looks like this.

{
  "text_description": "torch.optim.optimizer.Optimizer.zero_grad contains VarChangeEvent torch.nn.Parameter, pre_value: non_zero, post_value: None",
  "relation": "APIContainRelation",
  "params": [
    {
      "param_type": "APIParam",
      "api_full_name": "torch.optim.optimizer.Optimizer.zero_grad"
    },
    {
      "param_type": "VarTypeParam",
      "var_type": "torch.nn.Parameter",
      "attr_name": "grad",
      "pre_value": "non_zero",
      "post_value": null
    }
  ],
  "precondition": {
    "parent_func_call_pre": {
      "inverted": true,
      "preconditions": [
        {
          "clauses": [
            {
              "type": "constant",
              "prop_name": "meta_vars.step",
              "additional_path": "None",
              "prop_dtype": "int",
              "values": [
                0
              ]
            }
          ]
        },
        {
          "clauses": [
            {
              "type": "constant",
              "prop_name": "meta_vars.stage",
              "additional_path": "None",
              "prop_dtype": "str",
              "values": [
                "init",
                "testing"
              ]
            }
          ]
        }
      ]
    }
  },
  "num_positive_examples": 200,
  "num_negative_examples": 1
}

This invariant encodes the expectation that calling torch.optim.optimizer.Optimizer.zero_grad() should reset gradients — that is, the .grad attribute of torch.nn.Parameter objects should transition from a non-zero value to null (i.e., None or missing). - text_description:

A human-readable summary of the invariant.
> Note: This field is generated using a best-effort strategy and may not fully reflect the invariant’s semantics. In some cases, it may be missing or incomplete. 📆 We are planning to further formalize this field in the future.
  • relation: "APIContainRelation"

    An event is expected to happen within the duration of an API invocation.

  • params:

    • An API call: zero_grad() on a PyTorch optimizer
    • An attribute: .grad on a torch.nn.Parameter, which should change from a non-zero value ("pre_value": "non_zero") to null ("post_value": null) during the call
  • precondition: This invariant only applies outside the following contexts:

    • The first step of training (meta_vars.step == 0)
    • The init or testing stages (meta_vars.stage in {"init", "testing"})

      These are specified as inverted preconditions, meaning the invariant does not apply during those times (e.g., it’s okay to not clear .grad on the first step when nothing has been backpropagated yet).

  • num_positive_examples: 20 This behavior was observed and confirmed 200 times in the reference traces.

  • num_negative_examples: 1 The invariant failed once — in this case, during the first training iteration, when .grad had not yet been populated before the zero_grad() call. > 🎯 This behavior is expected and correctly handled by the precondition, which excludes step 0.

Invariant Inference Workflow

At a high level, TrainCheck performs invariant inference in three stages:

  1. Hypothesis Generation

    For each supported relation type, TrainCheck scans the provided traces and generates hypotheses by identifying patterns where a potential invariant could exist (i.e., when matching examples are observed).

  2. Example Collection

    For every hypothesis, TrainCheck performs a full scan across all provided traces to gather positive examples (where the hypothesized invariant holds) and negative examples (where it does not).

  3. Precondition Deduction

    TrainCheck analyzes the collected examples to infer a distinguishing predicate—a logical condition that holds true for all positive examples and false for negative ones. This predicate becomes the invariant’s precondition, reducing false positives during checking.

⚙️ For full details on the inference algorithms, please refer to our OSDI’25 paper (documentation is in progress).

🧪 Practical Guidelines: Choosing Input Pipelines

When selecting input pipelines for invariant inference, there are two main considerations:

  1. Representativeness

    You want your input pipelines to be diverse enough to infer a representative set of invariants. This helps: - Avoid overfitting to specific patterns. - Ensure that inferred invariants and preconditions remain accurate across varying scenarios.

    For example, if none of your input pipelines use mixed precision, TrainCheck might infer invariants like:

    "For mathematical operations, the output dtype must equal the input dtype."

    However, if mixed precision pipelines are included, TrainCheck will refine such invariants by adding preconditions like:

    "This applies only when a torch.autocast context manager is not active."

    ⚡ How many pipelines should you include? It depends on how different your target pipeline is from available reference pipelines: - If the target is a minor variant of a known-good pipeline, using just that reference may suffice. - If the target pipeline introduces new frameworks, tasks, or architectures, include a broader set of inputs to improve generalization.

  2. Inference Time

    Inference time is generally not a major concern, since inference happens offline. However, due to the repetitive nature of training loops, you can safely shorten reference runs without sacrificing invariant quality.

    In practice: - For all bugs detected by TrainCheck so far, we limited inference traces to at most 100 iterations. - Shortened runs have shown no significant impact on the usefulness or accuracy of inferred invariants.

Core Principles – A Summary

  • Focus on the diversity of input traces — capturing different configurations, behaviors, or modes of operation.
  • The length or size of traces matters far less.
  • Efficient inference is achievable with short, representative runs.

Implementation Limitations

TrainCheck operates on large traces with a dynamic schema, where variable types and fields can change over time. This, combined with the need for cross-trace comparisons, limits the use of typical data storage solutions like SQL databases or optimized DataFrame libraries (e.g., Polars), which require fixed schemas.

To handle this, we use in-process Pandas DataFrames backed by NumPy. While effective, this approach is currently single-threaded due to Python’s GIL, leaving room for future performance improvements.

We are exploring options such as shared-memory DataFrames, schema standardization, or schemaless databases (e.g., MongoDB) if data transmission overhead proves manageable.

Note: While data sharding could improve parallelism, it would overcomplicate cross-trace and cross-time analysis and is better handled at the storage layer rather than within inference logic.