Score Methodology

How metrics become percentiles, and how percentiles become dimension scores

What a score represents

An Artalytics dimension score is a percentile rank within the artist’s own portfolio. A Time & Effort score of 92 means the artwork sits in the top 8% of labor investment for that artist — not against a global database, not against other artists.

The score is a function of canvas-file metadata, aggregated through a fixed mathematical pipeline. No subjective input enters the calculation. The same portfolio run through the same pipeline produces identical scores, reproducibly.

This page covers how the number is produced, how confident we are in it, how multiple dimension scores combine, and where the framework’s limits are.

Back to framework overview · Validation status


The calculation pipeline

The full pipeline runs in five stages, each deterministic:

Stage 1 — Load canvas metadata

Every artwork in the artist’s portfolio has an associated canvas-metadata file containing:

  • Temporal data — stroke timestamps, active session boundaries, total creation time.
  • Spatial data — per-stroke coordinates, bounding boxes, canvas-coverage grids.
  • Chromatic data — per-stroke RGB values, palette introductions, spectrum occupation.
  • Behavioral data — technique-pattern sequences, session boundaries.

No image pixels are used. No AI classification is used. The metadata captures what the artist did, frame by frame.

Stage 2 — Compute raw metrics

Each dimension’s five proprietary metric families are computed as fixed mathematical functions of the metadata. Public documentation describes the dimensions at the foundation level; private/internal documentation contains the named metric definitions and validation bounds.

Every metric is a pure function of metadata. No tuning per-artwork.

Stage 3 — Convert raw metrics to percentiles

For each metric, the artwork’s raw value is ranked against the distribution of that same metric across all other artworks in the artist’s portfolio. The rank is converted to a percentile on 0–100:

  • Values at or above the portfolio maximum → 100.
  • Values at or below the portfolio minimum → 1.
  • Values between min and max → interpolated rank-percentile.

Direction handling. Metric direction is defined in the scoring pipeline before percentile aggregation. Most metric families reward higher raw values; the private methodology identifies the exception and documents the inversion rule.

Stage 4 — Average per-metric percentiles into a dimension score

Each dimension score is the arithmetic mean of its five metric percentiles, rounded to one decimal:

\[ \text{DimensionScore} = \frac{1}{5} \sum_{i=1}^{5} p_i \]

Equal weighting is used by default. Weight tuning is on the validation roadmap but not in production today.

Stage 5 — Assign confidence level

Confidence reflects how meaningful the portfolio-relative comparison is given portfolio size:

Portfolio size Confidence What it means
> 10 artworks High Percentile distribution is stable; scores are informative.
4 – 10 artworks Medium Scores directionally useful; single artworks can shift the baseline materially.
≤ 3 artworks Low Percentiles have little statistical meaning; report with explicit caveat.
1 artwork N/A All scores default to 50 with a “first-artwork” flag. No comparison is possible.

The confidence label is always displayed alongside the score.


Reading a single dimension score

A standalone dimension score lands in one of four interpretive bands:

Range Interpretation
85 – 100 Peak-tier work for this artist along this dimension.
60 – 84 Above-median, dedicated work.
40 – 59 Standard output — this artist’s baseline.
Below 40 Studies, sketches, experimental, or low-investment work along this dimension.

“Above-median” does not mean “good” and “below 40” does not mean “bad.” The score describes where the work sits on that dimension within this artist’s range — intent and style determine whether that’s desirable.


Reading the three dimensions together

The three dimensions are designed to be read together, not collapsed. Certain multi-dimensional patterns are interpretable at a glance:

Pattern Interpretation
High Time & Effort + High Skill & Artistry + High Complexity & Detail Comprehensive peak work. Portfolio highlight.
High Skill & Artistry + Low Complexity & Detail Deliberate minimalism. Mastery exercised toward simplicity.
High Complexity & Detail + Low Skill & Artistry Ambitious but still-developing execution.
High Time & Effort + Low Skill & Artistry Extensive labor, exploratory or early-career work.
Low Time & Effort + High Skill & Artistry Rapid virtuoso execution. Confidence of technique.
Uniformly Low Study or sketch.

These are patterns, not grades. A “high all three” work is not necessarily better than a “high skill, deliberate minimalism” work — they serve different artistic intents.


Limitations

Portfolio-relative, not market-absolute

Scores describe position within an artist’s output. They do not translate to dollar value. The same 92nd-percentile score means very different things across artists at different market tiers. Any use in pricing, appraisal, or lending must combine framework output with independent market data.

Portfolio-size sensitivity

Small portfolios produce volatile percentiles. A score based on 5 artworks can swing by 15–20 points when a sixth is added. The confidence label addresses this, but it remains a structural limit.

Style-dependent interpretation

Some styles systematically produce certain metric patterns (e.g., underpaint-then-detail workflows produce unstable frame distributions). The portfolio-relative framing normalizes most of this, but edge cases exist where one artist’s natural baseline would be another’s outlier.

Design-stage validation

Formal validation — retest reliability studies, cross-artist correlation panels, expert-rater agreement — is described in the Validation Framework. These are planned protocols with stated maturity levels, not current empirical results. The scoring method is mathematically sound; the claim that scores correlate with market or expert judgment is design-stage.

Not a substitute for expertise

Framework outputs inform, but do not replace: certified appraisal, provenance research, market-comparable analysis, conservator examination, or authentication procedures.


Reproducibility guarantees

  • Deterministic. The same portfolio through the same pipeline version produces byte-identical outputs.
  • Versioned. Metric definitions, validation bounds, and aggregation rules carry a pipeline version. Score changes across versions are tracked in release notes.
  • Auditable. Every metric has a documented formula, production validation checks, and unit-tested edge cases (internal-profile detail sections).
  • No personalization. No artist-specific tuning parameters exist in production. A new artist’s portfolio is scored by the same pipeline that scored every prior artist’s.

Worked example — reading a score card

Suppose an artwork scores:

  • Time & Effort: 89 (high confidence)
  • Skill & Artistry: 82 (high confidence)
  • Complexity & Detail: 94 (high confidence)

Reading this card:

  1. All three are above 80. This is a peak-tier work for this artist.
  2. Complexity is highest. The work is particularly distinguished by intricacy — stroke density, spatial variability, or compositional planning exceeds this artist’s typical range.
  3. High confidence across all three. The artist’s portfolio is mature enough that these percentiles are stable.
  4. All within same band. No interpretive tension — this isn’t a “high effort but low skill” or “complex but sloppy” pattern. It’s consistently peak.

Practitioners working with framework outputs (lenders, wealth managers, curators) combine this with market data, provenance, and domain expertise. The framework provides the quantitative layer; the final judgment remains theirs.