Score Methodology
How metrics become percentiles, and how percentiles become dimension scores
What a score represents
An Artalytics dimension score is a percentile rank within the artist’s own portfolio. A Time & Effort score of 92 means the artwork sits in the top 8% of labor investment for that artist — not against a global database, not against other artists.
The score is a function of canvas-file metadata, aggregated through a fixed mathematical pipeline. No subjective input enters the calculation. The same portfolio run through the same pipeline produces identical scores, reproducibly.
This page covers how the number is produced, how confident we are in it, how multiple dimension scores combine, and where the framework’s limits are.
Back to framework overview · Validation status
The calculation pipeline
The full pipeline runs in five stages, each deterministic:
Stage 1 — Load canvas metadata
Every artwork in the artist’s portfolio has an associated canvas-metadata file containing:
- Temporal data — stroke timestamps, active session boundaries, total creation time.
- Spatial data — per-stroke coordinates, bounding boxes, canvas-coverage grids.
- Chromatic data — per-stroke RGB values, palette introductions, spectrum occupation.
- Behavioral data — technique-pattern sequences, session boundaries.
No image pixels are used. No AI classification is used. The metadata captures what the artist did, frame by frame.
Stage 2 — Compute raw metrics
Each dimension’s five proprietary metric families are computed as fixed mathematical functions of the metadata. Public documentation describes the dimensions at the foundation level; private/internal documentation contains the named metric definitions and validation bounds.
Every metric is a pure function of metadata. No tuning per-artwork.
Stage 3 — Convert raw metrics to percentiles
For each metric, the artwork’s raw value is ranked against the distribution of that same metric across all other artworks in the artist’s portfolio. The rank is converted to a percentile on 0–100:
- Values at or above the portfolio maximum → 100.
- Values at or below the portfolio minimum → 1.
- Values between min and max → interpolated rank-percentile.
Direction handling. Metric direction is defined in the scoring pipeline before percentile aggregation. Most metric families reward higher raw values; the private methodology identifies the exception and documents the inversion rule.
Stage 4 — Average per-metric percentiles into a dimension score
Each dimension score is the arithmetic mean of its five metric percentiles, rounded to one decimal:
\[ \text{DimensionScore} = \frac{1}{5} \sum_{i=1}^{5} p_i \]
Equal weighting is used by default. Weight tuning is on the validation roadmap but not in production today.
Stage 5 — Assign confidence level
Confidence reflects how meaningful the portfolio-relative comparison is given portfolio size:
| Portfolio size | Confidence | What it means |
|---|---|---|
| > 10 artworks | High | Percentile distribution is stable; scores are informative. |
| 4 – 10 artworks | Medium | Scores directionally useful; single artworks can shift the baseline materially. |
| ≤ 3 artworks | Low | Percentiles have little statistical meaning; report with explicit caveat. |
| 1 artwork | N/A | All scores default to 50 with a “first-artwork” flag. No comparison is possible. |
The confidence label is always displayed alongside the score.
Reading a single dimension score
A standalone dimension score lands in one of four interpretive bands:
| Range | Interpretation |
|---|---|
| 85 – 100 | Peak-tier work for this artist along this dimension. |
| 60 – 84 | Above-median, dedicated work. |
| 40 – 59 | Standard output — this artist’s baseline. |
| Below 40 | Studies, sketches, experimental, or low-investment work along this dimension. |
“Above-median” does not mean “good” and “below 40” does not mean “bad.” The score describes where the work sits on that dimension within this artist’s range — intent and style determine whether that’s desirable.
Reading the three dimensions together
The three dimensions are designed to be read together, not collapsed. Certain multi-dimensional patterns are interpretable at a glance:
| Pattern | Interpretation |
|---|---|
| High Time & Effort + High Skill & Artistry + High Complexity & Detail | Comprehensive peak work. Portfolio highlight. |
| High Skill & Artistry + Low Complexity & Detail | Deliberate minimalism. Mastery exercised toward simplicity. |
| High Complexity & Detail + Low Skill & Artistry | Ambitious but still-developing execution. |
| High Time & Effort + Low Skill & Artistry | Extensive labor, exploratory or early-career work. |
| Low Time & Effort + High Skill & Artistry | Rapid virtuoso execution. Confidence of technique. |
| Uniformly Low | Study or sketch. |
These are patterns, not grades. A “high all three” work is not necessarily better than a “high skill, deliberate minimalism” work — they serve different artistic intents.
Limitations
Portfolio-relative, not market-absolute
Scores describe position within an artist’s output. They do not translate to dollar value. The same 92nd-percentile score means very different things across artists at different market tiers. Any use in pricing, appraisal, or lending must combine framework output with independent market data.
Portfolio-size sensitivity
Small portfolios produce volatile percentiles. A score based on 5 artworks can swing by 15–20 points when a sixth is added. The confidence label addresses this, but it remains a structural limit.
Style-dependent interpretation
Some styles systematically produce certain metric patterns (e.g., underpaint-then-detail workflows produce unstable frame distributions). The portfolio-relative framing normalizes most of this, but edge cases exist where one artist’s natural baseline would be another’s outlier.
Design-stage validation
Formal validation — retest reliability studies, cross-artist correlation panels, expert-rater agreement — is described in the Validation Framework. These are planned protocols with stated maturity levels, not current empirical results. The scoring method is mathematically sound; the claim that scores correlate with market or expert judgment is design-stage.
Not a substitute for expertise
Framework outputs inform, but do not replace: certified appraisal, provenance research, market-comparable analysis, conservator examination, or authentication procedures.
Reproducibility guarantees
- Deterministic. The same portfolio through the same pipeline version produces byte-identical outputs.
- Versioned. Metric definitions, validation bounds, and aggregation rules carry a pipeline version. Score changes across versions are tracked in release notes.
- Auditable. Every metric has a documented formula, production validation checks, and unit-tested edge cases (internal-profile detail sections).
- No personalization. No artist-specific tuning parameters exist in production. A new artist’s portfolio is scored by the same pipeline that scored every prior artist’s.
Worked example — reading a score card
Suppose an artwork scores:
- Time & Effort: 89 (high confidence)
- Skill & Artistry: 82 (high confidence)
- Complexity & Detail: 94 (high confidence)
Reading this card:
- All three are above 80. This is a peak-tier work for this artist.
- Complexity is highest. The work is particularly distinguished by intricacy — stroke density, spatial variability, or compositional planning exceeds this artist’s typical range.
- High confidence across all three. The artist’s portfolio is mature enough that these percentiles are stable.
- All within same band. No interpretive tension — this isn’t a “high effort but low skill” or “complex but sloppy” pattern. It’s consistently peak.
Practitioners working with framework outputs (lenders, wealth managers, curators) combine this with market data, provenance, and domain expertise. The framework provides the quantitative layer; the final judgment remains theirs.