db1044072c
New Section 4.5 proves that completing old tasks is actively punished by the unweighted mean: a single 26-day-old task hurts the average more than 26 one-day tasks help it (same total wait resolved, worse metric). The rational response is not starvation (Theorem 3) but abandonment — closing aged tasks as "won't fix" to protect the average. Changes: - New Section 4.5 with Theorem 6.1 and Corollary 6.2 - Old Section 4.5 (Compound Effect) renumbered to 4.6, table updated - Conclusion updated with new item 3, subsequent items renumbered - Edition 1 backed up to .backup/README.md.v1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1250 lines
53 KiB
Plaintext
1250 lines
53 KiB
Plaintext
# Unweighted Average Completion Time Is Not a Fair Metric for Task Scheduling
|
||
|
||
A mathematical proof that unweighted average task completion time is a biased
|
||
statistic that incentivizes cherry-picking easy work, and that any scheduling
|
||
advantage it appears to reveal is an artifact of the metric — not a reflection
|
||
of genuine throughput or service quality.
|
||
|
||
---
|
||
|
||
## 1. Introduction
|
||
|
||
Many organizations measure task-execution performance by **unweighted mean
|
||
completion time**: the average number of hours (or days) between task
|
||
submission and task resolution, counting each task equally regardless of
|
||
size or priority.
|
||
|
||
This paper proves that this metric is not merely imprecise but structurally
|
||
biased. It can be improved by reordering work without doing any additional
|
||
work (Theorem 1), while a properly weighted alternative is completely
|
||
immune to scheduling manipulation (Theorem 2). When combined with a
|
||
priority system, the metric actively contradicts the organization's own
|
||
priority classifications (Theorem 9).
|
||
|
||
The argument proceeds in four parts:
|
||
|
||
- **Part I** (Sections 2–4) establishes the mathematical foundation:
|
||
the unweighted mean is gameable by Shortest Processing Time (SPT)
|
||
scheduling, the work-weighted mean is schedule-invariant, and the
|
||
resulting service-quality consequences are provably negative.
|
||
|
||
- **Part II** (Sections 5–6) extends the model to priority-classified
|
||
tasks, proves the metric becomes adversarial to the priority system,
|
||
and proposes weighted alternatives with a worked IT service desk example.
|
||
|
||
- **Part III** (Sections 7–9) examines organizational dynamics: what
|
||
happens when the metric is reported to clients (information asymmetry),
|
||
what happens to team members who understand its flaws (psychological
|
||
harm), and what a single informed manager can do about it (constrained
|
||
optimization with game-theoretic stability analysis).
|
||
|
||
- **Part IV** (Sections 10–12) presents honest counterarguments, situates
|
||
the work in existing literature, and concludes.
|
||
|
||
The core results build on Smith's (1956) foundational scheduling theory [1],
|
||
extended through game theory [9, 10], organizational measurement theory
|
||
[18, 19], and psychology [11–17] to trace a complete chain from a
|
||
mathematical proof about a specific metric to organizational outcomes.
|
||
|
||
---
|
||
|
||
# Part I: Mathematical Foundation
|
||
|
||
## 2. Definitions
|
||
|
||
Let there be **n** tasks with processing times $p_1, p_2, \ldots, p_n$.
|
||
|
||
A **schedule** $\sigma$ is a permutation of $\{1, 2, \ldots, n\}$ assigning
|
||
tasks to execution order on a single executor.
|
||
|
||
The **completion time** of task $\sigma(k)$ under schedule $\sigma$ is:
|
||
|
||
$$C_{\sigma(k)} = \sum_{j=1}^{k} p_{\sigma(j)}$$
|
||
|
||
The **unweighted mean completion time** is:
|
||
|
||
$$\bar{C}(\sigma) = \frac{1}{n} \sum_{k=1}^{n} C_{\sigma(k)}$$
|
||
|
||
The **work-weighted mean completion time** is:
|
||
|
||
$$\bar{C}_w(\sigma) = \frac{\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)}}{\sum_{k=1}^{n} p_{\sigma(k)}}$$
|
||
|
||
---
|
||
|
||
## 3. Core Results
|
||
|
||
### 3.1 The Unweighted Mean Is Gameable
|
||
|
||
**Theorem 1** (Smith, 1956 [1])**.** The schedule that minimizes
|
||
$\bar{C}(\sigma)$ is Shortest Processing Time first (SPT): sort tasks so
|
||
that $p_{\sigma(1)} \le p_{\sigma(2)} \le \cdots \le p_{\sigma(n)}$.
|
||
|
||
**Proof (exchange argument [1, 2]).**
|
||
|
||
Consider any schedule $\sigma$ in which two adjacent tasks $i, j$ satisfy
|
||
$p_i > p_j$ with task $i$ scheduled immediately before task $j$. Let $t$
|
||
be the start time of task $i$.
|
||
|
||
| | Task $i$ finishes | Task $j$ finishes | Sum |
|
||
|---|---|---|---|
|
||
| **Before swap** ($i$ then $j$) | $t + p_i$ | $t + p_i + p_j$ | $2t + 2p_i + p_j$ |
|
||
| **After swap** ($j$ then $i$) | $t + p_j$ | $t + p_j + p_i$ | $2t + p_i + 2p_j$ |
|
||
|
||
The change in the sum of completion times is:
|
||
|
||
$$(2p_i + p_j) - (p_i + 2p_j) = p_i - p_j > 0$$
|
||
|
||
Every swap of a longer-before-shorter adjacent pair strictly reduces the
|
||
total. Any non-SPT schedule contains such a pair. Repeated swaps converge
|
||
to SPT. Therefore SPT uniquely minimizes $\bar{C}(\sigma)$. $\blacksquare$
|
||
|
||
### 3.2 The Work-Weighted Mean Is Schedule-Invariant
|
||
|
||
**Theorem 2.** The work-weighted mean completion time $\bar{C}_w(\sigma)$
|
||
is the same for every schedule $\sigma$.
|
||
|
||
**Proof.**
|
||
|
||
Expand the numerator:
|
||
|
||
$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_{k=1}^{n} p_{\sigma(k)} \sum_{j=1}^{k} p_{\sigma(j)}$$
|
||
|
||
Reindex by letting $a = \sigma(k)$ and $b = \sigma(j)$. The double sum
|
||
counts every ordered pair $(a, b)$ where $b$ is scheduled no later than $a$:
|
||
|
||
$$= \sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b$$
|
||
|
||
For any pair $(a, b)$ with $a \ne b$, exactly one of
|
||
$\{b \preceq_\sigma a\}$ or $\{a \prec_\sigma b\}$ holds. The diagonal
|
||
terms ($a = b$) contribute $p_a^2$ regardless of order. Therefore:
|
||
|
||
$$\sum_{\substack{a, b \\ b \preceq_\sigma a}} p_a \, p_b = \sum_{a} p_a^2 + \sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b$$
|
||
|
||
Together with the complementary sum, the two off-diagonal sums cover all
|
||
unordered pairs:
|
||
|
||
$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b + \sum_{\substack{a \ne b \\ a \prec_\sigma b}} p_a \, p_b = \sum_{a \ne b} p_a \, p_b$$
|
||
|
||
The right-hand side is schedule-independent. By symmetry of $p_a p_b$,
|
||
both off-diagonal sums are equal:
|
||
|
||
$$\sum_{\substack{a \ne b \\ b \prec_\sigma a}} p_a \, p_b = \frac{1}{2} \sum_{a \ne b} p_a \, p_b$$
|
||
|
||
Therefore:
|
||
|
||
$$\sum_{k=1}^{n} p_{\sigma(k)} \cdot C_{\sigma(k)} = \sum_a p_a^2 + \frac{1}{2} \sum_{a \ne b} p_a \, p_b = \frac{1}{2}\left(\sum_a p_a\right)^2 + \frac{1}{2}\sum_a p_a^2$$
|
||
|
||
This expression contains no reference to $\sigma$. Since the denominator
|
||
$\sum p_a$ is also schedule-independent:
|
||
|
||
$$\bar{C}_w(\sigma) = \frac{\frac{1}{2}\left(\sum p_a\right)^2 + \frac{1}{2}\sum p_a^2}{\sum p_a}$$
|
||
|
||
is **constant across all schedules**. $\blacksquare$
|
||
|
||
This is an instance of the conservation laws in scheduling identified by
|
||
Coffman, Shanthikumar, and Yao [20]. The invariance corresponds to
|
||
measuring how long a unit of *work* waits rather than how long a *task*
|
||
waits — the unweighted statistic counts completions rather than work,
|
||
which is why it is gameable. (See also Little [3, 4] for the queueing-
|
||
theoretic context, with the caveat that Little's Law applies directly
|
||
only to steady-state systems, not to the batch case analyzed here.)
|
||
|
||
### 3.3 Illustrative Example
|
||
|
||
Two tasks: $A$ with $p_A = 1$ hour, $B$ with $p_B = 10$ hours.
|
||
|
||
| Schedule | $C_A$ | $C_B$ | Unweighted mean | Work-weighted mean |
|
||
|----------|-------|-------|-----------------|-------------------|
|
||
| SPT (A first) | 1 | 11 | 6.0 | 111/11 ≈ 10.09 |
|
||
| Reverse (B first) | 11 | 10 | 10.5 | 111/11 ≈ 10.09 |
|
||
|
||
SPT appears **4.5 hours better** on the unweighted metric but provides
|
||
**zero improvement** on the work-weighted metric. The apparent advantage
|
||
exists only because the unweighted statistic lets a 1-hour task "vote"
|
||
equally with a 10-hour task.
|
||
|
||
---
|
||
|
||
## 4. Consequences for Service Quality
|
||
|
||
### 4.1 Starvation of Large Tasks
|
||
|
||
**Theorem 3 (Metric Bias).** Any scheduling policy that minimizes
|
||
unweighted mean completion time necessarily maximizes the completion time
|
||
of the largest task.
|
||
|
||
**Proof.** SPT places the largest task last. Its completion time equals
|
||
the total processing time $\sum p_i$, which is the maximum possible
|
||
completion time for any individual task. Under any schedule that does not
|
||
place the largest task last, that task completes strictly earlier.
|
||
$\blacksquare$
|
||
|
||
This creates a **starvation incentive**: rational agents optimizing the
|
||
unweighted statistic will indefinitely defer large tasks in favor of small
|
||
ones. Austin [18] identified this general pattern — that incomplete
|
||
measurement creates incentives to optimize the measured dimension at the
|
||
expense of unmeasured ones — in the context of organizational performance
|
||
management. Theorem 3 provides the specific mechanism for task scheduling.
|
||
|
||
### 4.2 Maximum Completion Time for the Largest Task
|
||
|
||
**Theorem 4 (SPT Uniquely Maximizes Completion Time of the Largest Task).**
|
||
Among all schedules, SPT is the unique policy that assigns the maximum
|
||
possible completion time ($\sum p_i$) to the largest task.
|
||
|
||
**Proof.** SPT sorts tasks in ascending order of $p_i$, placing the largest
|
||
task $p_{\max}$ in the last position. The last task in any schedule has
|
||
completion time $\sum_{i=1}^{n} p_i$, which is the maximum any individual
|
||
task can receive. Under any schedule that does not place $p_{\max}$ last,
|
||
it completes strictly before $\sum p_i$. $\blacksquare$
|
||
|
||
**Corollary 4.1.** A team optimizing unweighted mean completion time will
|
||
systematically deliver the worst experience to clients with the most
|
||
complex needs. This is not a side effect — it is the *mechanism* by which
|
||
the metric improves.
|
||
|
||
**Note on slowdown ratios.** SPT actually *compresses* slowdown ratios
|
||
($S_i = C_i / p_i$) because larger tasks in later positions have large
|
||
denominators that absorb the accumulated sum. For example, with tasks
|
||
$[1, 5, 10]$: SPT gives slowdowns $[1, 1.2, 1.6]$ (low variance) while
|
||
LPT gives $[1, 3, 16]$ (high variance). SPT's harm to large-task clients
|
||
is not visible in the slowdown ratio — it is visible in **absolute
|
||
completion time**. This distinction is important: the scheduling fairness
|
||
literature [21, 22, 23] has debated SPT/SRPT unfairness primarily through
|
||
slowdown-based measures, which can obscure the absolute-delay burden
|
||
proved below.
|
||
|
||
### 4.3 Delay Concentration
|
||
|
||
**Theorem 5 (SPT Concentrates Delay on the Largest Task).** Under SPT,
|
||
the largest task bears more absolute delay than under any other schedule.
|
||
|
||
**Proof.** Define absolute delay as $\Delta_i = C_i - p_i$ (time spent
|
||
waiting, independent of own size). Under SPT, the largest task is in
|
||
position $n$ with:
|
||
|
||
$$\Delta_{\max\text{-task}}^{\text{SPT}} = C_n - p_n = \sum_{i=1}^{n-1} p_i$$
|
||
|
||
This is the sum of all other tasks' processing times — the maximum possible
|
||
delay for any single task. Under any schedule where the largest task is not
|
||
last, its delay is strictly less. Meanwhile, SPT gives the smallest task
|
||
zero delay ($\Delta_1^{\text{SPT}} = 0$). The entire queuing burden is
|
||
shifted from small tasks to large tasks. $\blacksquare$
|
||
|
||
SPT minimizes *total* delay (good for aggregate efficiency) by
|
||
concentrating delay onto the tasks best able to absorb it in slowdown-ratio
|
||
terms. But in absolute terms — hours spent waiting — the largest task bears
|
||
the full weight.
|
||
|
||
### 4.4 Throughput Invariance
|
||
|
||
**Theorem 6 (Throughput Invariance).** Total work completed over any time
|
||
horizon $T$ is identical under all scheduling policies.
|
||
|
||
**Proof.** The executor processes work at a fixed rate. Over any horizon
|
||
$T \ge \sum p_i$, the total work done is exactly $\sum p_i$ regardless of
|
||
order. For the steady-state case with ongoing arrivals, the long-run
|
||
throughput is determined by the service rate $\mu$ and is completely
|
||
independent of scheduling:
|
||
|
||
$$\lim_{T \to \infty} \frac{W(T)}{T} = \mu \quad \text{for all schedules } \sigma$$
|
||
|
||
$\blacksquare$
|
||
|
||
**Corollary 6.1.** A team that switches from any scheduling policy to SPT
|
||
will observe an improvement in unweighted mean completion time with **zero
|
||
change in actual throughput**. The metric improves. The output does not.
|
||
|
||
### 4.5 The Compound Effect
|
||
|
||
Combining Theorems 4, 5, and 6:
|
||
|
||
| Measure | Effect of optimizing unweighted mean |
|
||
|---------|--------------------------------------|
|
||
| Throughput (work/time) | No change (Theorem 6) |
|
||
| Delay for small tasks | Minimized — approaches zero (SPT) |
|
||
| Delay for large tasks | **Maximized** — bears all queuing burden (Theorem 5) |
|
||
| Completion time of largest task | **Maximum possible**: $\sum p_i$ (Theorem 4) |
|
||
|
||
The net effect on perceived quality is negative because:
|
||
|
||
1. **Loss aversion is asymmetric** [8]. A client whose 100-hour task is
|
||
deprioritized experiences a large, salient negative. A client whose
|
||
1-hour task is expedited experiences a small, often unnoticed positive.
|
||
|
||
2. **High-effort tasks correlate with high-value clients.** Large tasks
|
||
are disproportionately likely to come from major clients, complex
|
||
contracts, or critical business needs.
|
||
|
||
3. **Starvation compounds.** In a continuous system (Theorem 3), large
|
||
tasks may be **indefinitely deferred** as new small tasks keep arriving.
|
||
|
||
**Theorem 7 (The Core Result).** For a team processing tasks of non-uniform
|
||
size, adopting unweighted mean completion time as a performance metric:
|
||
|
||
(a) Provides **zero productivity gain** (Theorem 6), while
|
||
(b) **Assigning the maximum possible completion time** to the largest task
|
||
(Theorem 4), and
|
||
(c) **Concentrating all queuing delay** onto the largest tasks while
|
||
eliminating delay for the smallest (Theorem 5).
|
||
|
||
This is not a tradeoff. The metric creates a pure transfer of service
|
||
quality from high-effort clients to low-effort clients, with no net work
|
||
gained. $\blacksquare$
|
||
|
||
---
|
||
|
||
# Part II: Priority Systems
|
||
|
||
## 5. Breakdown Under Priority Classification
|
||
|
||
The preceding sections proved that unweighted mean completion time is
|
||
biased when tasks vary in size. We now show that introducing a **priority
|
||
system** — as virtually all real teams use — causes the metric to become
|
||
not merely biased but **actively adversarial** to the organization's stated
|
||
goals.
|
||
|
||
### 5.1 Extended Model: Tasks With Priority
|
||
|
||
Let each task $i$ have processing time $p_i$ and a priority class
|
||
$q_i \in \{1, 2, 3, 4\}$ where 1 is the highest priority (critical) and
|
||
4 is the lowest (cosmetic/enhancement). Assign priority weights:
|
||
|
||
$$w(q) = \begin{cases} 8 & q = 1 \text{ (Critical)} \\ 4 & q = 2 \text{ (High)} \\ 2 & q = 3 \text{ (Medium)} \\ 1 & q = 4 \text{ (Low)} \end{cases}$$
|
||
|
||
The specific weights are illustrative; the results hold for any strictly
|
||
decreasing weight function. The key property is that priority is assigned
|
||
by **business impact**, not by task size.
|
||
|
||
### 5.2 The Metric Contradicts the Priority System
|
||
|
||
**Theorem 8 (Priority-Size Inversion).** When priority is independent of
|
||
task size, the schedule that minimizes unweighted mean completion time
|
||
(SPT) will, in expectation, complete low-priority tasks before
|
||
high-priority tasks of greater size.
|
||
|
||
**Proof.** SPT orders tasks by $p_i$ ascending, regardless of $q_i$.
|
||
Consider two tasks:
|
||
|
||
- Task A: $p_A = 40$ hours, $q_A = 1$ (Critical — e.g., server outage)
|
||
- Task B: $p_B = 0.5$ hours, $q_B = 4$ (Low — e.g., cosmetic UI fix)
|
||
|
||
SPT schedules B before A. The unweighted mean for this pair:
|
||
|
||
$$\bar{C}^{\text{SPT}} = \frac{0.5 + 40.5}{2} = 20.5 \qquad \bar{C}^{\text{priority}} = \frac{40 + 40.5}{2} = 40.25$$
|
||
|
||
The metric declares SPT nearly **twice as good** — despite completing a
|
||
cosmetic fix while a server outage burns.
|
||
|
||
In general, when $q_i$ is statistically independent of $p_i$, SPT's
|
||
ordering has **zero correlation** with priority. In practice, Critical
|
||
tasks (outages, security incidents, data loss) often require more work
|
||
than Low tasks, so the metric is plausibly **anti-correlated** with the
|
||
priority system. $\blacksquare$
|
||
|
||
### 5.3 Information Destruction
|
||
|
||
The unweighted mean reduces a three-dimensional task $(p_i, q_i, C_i)$ to
|
||
a one-dimensional signal ($C_i$), then averages uniformly. This discards
|
||
priority entirely and implicitly inverts size.
|
||
|
||
**Theorem 9 (Information Destruction).** Let $I(\sigma)$ be the mutual
|
||
information between the schedule's implicit priority ranking (position)
|
||
and the actual priority assignment $q_i$. For SPT:
|
||
|
||
$$I(\sigma_{\text{SPT}}) = 0 \quad \text{when } p_i \perp q_i$$
|
||
|
||
**Proof.** SPT assigns positions based solely on $p_i$. When $p_i$ and
|
||
$q_i$ are independent, knowing a task's position in the SPT schedule
|
||
provides zero information about its priority. $\blacksquare$
|
||
|
||
**Corollary 9.1.** A team that optimizes unweighted mean completion time
|
||
is operating a scheduling system that carries zero information about its
|
||
own priority classification. The priority field in their ticketing system
|
||
is, with respect to execution order, decorative.
|
||
|
||
This is an instance of what Austin [18] calls the fundamental problem of
|
||
incomplete measurement: when the measurement system captures only a subset
|
||
of the relevant dimensions, optimizing the measurement systematically
|
||
degrades the unmeasured dimensions.
|
||
|
||
### 5.4 Priority-Weighted Delay Cost
|
||
|
||
Define the **priority-weighted delay cost** of a schedule:
|
||
|
||
$$D(\sigma) = \sum_{i=1}^{n} w(q_i) \cdot C_i$$
|
||
|
||
**Theorem 10 (SPT and Priority-Weighted Delay Cost).** The optimal
|
||
schedule for minimizing $D(\sigma)$ is WSJF: order by $w(q_i)/p_i$
|
||
descending [1, 5]. SPT's ordering — by $1/p_i$ descending — ignores
|
||
priority entirely and produces higher $D$ than priority-respecting
|
||
alternatives when priority is correlated with task size.
|
||
|
||
**Proof.** By the exchange argument, swapping adjacent tasks $i, j$
|
||
changes $D$ by:
|
||
|
||
$$\Delta D = w(q_j) \cdot p_i - w(q_i) \cdot p_j$$
|
||
|
||
The swap improves $D$ when $w(q_j)/p_j > w(q_i)/p_i$ but $j$ is
|
||
scheduled after $i$. Therefore the optimal order is decreasing
|
||
$w(q_i)/p_i$ — the WSJF rule. SPT corresponds to WSJF only when
|
||
$w(q_i) = \text{const}$ (all tasks have equal priority).
|
||
|
||
**Example.** Critical ($w = 8$, $p = 3$) and Low ($w = 1$, $p = 2$):
|
||
|
||
- SPT (Low first): $D = 1 \cdot 2 + 8 \cdot 5 = 42$
|
||
- WSJF (Critical first): $D = 8 \cdot 3 + 1 \cdot 5 = 29$
|
||
|
||
SPT incurs 45% more priority-weighted delay. In practice, Critical tasks
|
||
tend to be larger (outages, security incidents), making the divergence
|
||
systematic. $\blacksquare$
|
||
|
||
---
|
||
|
||
## 6. Proposed Solutions
|
||
|
||
### 6.1 Priority-Weighted Metrics
|
||
|
||
Replace unweighted mean completion time with the **Priority-Weighted
|
||
Completion Score (PWCS)**:
|
||
|
||
$$\text{PWCS}(\sigma) = \frac{\sum_{i=1}^{n} w(q_i) \cdot \frac{C_i}{p_i}}{\sum_{i=1}^{n} w(q_i)}$$
|
||
|
||
This is the priority-weighted mean slowdown ratio. It measures how long
|
||
each task waited relative to its size, weighted by how much that task
|
||
mattered. Lower is better.
|
||
|
||
**Properties:**
|
||
|
||
1. **Priority-respecting.** Delays to Critical tasks cost 8x more than
|
||
delays to Low tasks.
|
||
2. **Size-fair.** Uses slowdown ratio $C_i / p_i$, so large tasks are not
|
||
penalized for being large.
|
||
3. **Not gameable by SPT.** Reordering by processing time does not
|
||
systematically improve the score.
|
||
4. **Reduces to unweighted mean when tasks are uniform.** A strict
|
||
generalization.
|
||
|
||
### 6.2 Optimal Policy: WSJF
|
||
|
||
**Theorem 11.** The schedule minimizing the priority-weighted completion
|
||
time $\text{PWCT}(\sigma) = \sum w(q_i) \cdot C_i / \sum w(q_i)$ processes
|
||
tasks in order of decreasing $w(q_i)/p_i$ — the **Weighted Shortest Job
|
||
First (WSJF)** rule [1, 5].
|
||
|
||
**Proof.** By the exchange argument (as in Theorem 10), the swap of
|
||
adjacent tasks $i, j$ improves PWCT when $w(q_j)/p_j > w(q_i)/p_i$ but
|
||
$j$ is scheduled after $i$. The optimal order is therefore decreasing
|
||
$w(q_i)/p_i$. $\blacksquare$
|
||
|
||
Within a priority class, this reduces to SPT (shortest first). Across
|
||
classes, a Critical 4-hour task ($w/p = 2.0$) beats a Low 1-hour task
|
||
($w/p = 1.0$).
|
||
|
||
**Practical caveat.** Pure WSJF can place tiny Low-priority tasks ahead
|
||
of large Critical tasks (a 15-minute Low task has $w/p = 1/0.25 = 4.0$,
|
||
beating a 6-hour Critical at $w/p = 8/6 = 1.33$). In practice, this is
|
||
mitigated by enforcing **strict priority-class ordering** and applying
|
||
WSJF only *within* each class.
|
||
|
||
### 6.3 Applied Example: IT Service Desk
|
||
|
||
Consider an IT team with the following ticket queue:
|
||
|
||
| Ticket | Priority | Type | Est. Hours |
|
||
|--------|----------|------|-----------|
|
||
| T1 | P1 (Critical) | Email server down | 6 |
|
||
| T2 | P2 (High) | VPN failing for remote team | 4 |
|
||
| T3 | P3 (Medium) | New employee laptop setup | 2 |
|
||
| T4 | P4 (Low) | Update desktop wallpaper policy | 0.5 |
|
||
| T5 | P3 (Medium) | Install software license | 1 |
|
||
| T6 | P1 (Critical) | Database backup failing | 3 |
|
||
| T7 | P2 (High) | Printer fleet offline | 2 |
|
||
| T8 | P4 (Low) | Archive old shared drive folder | 0.25 |
|
||
|
||
**SPT order** (optimizing unweighted mean): T8, T4, T5, T3, T7, T6, T2, T1
|
||
|
||
| Pos | Ticket | Priority | Hours | Completion | Slowdown |
|
||
|-----|--------|----------|-------|------------|----------|
|
||
| 1 | T8 (archive folder) | P4 Low | 0.25 | 0.25 | 1.0 |
|
||
| 2 | T4 (wallpaper) | P4 Low | 0.5 | 0.75 | 1.5 |
|
||
| 3 | T5 (software) | P3 Med | 1 | 1.75 | 1.75 |
|
||
| 4 | T3 (laptop) | P3 Med | 2 | 3.75 | 1.875 |
|
||
| 5 | T7 (printers) | P2 High | 2 | 5.75 | 2.875 |
|
||
| 6 | T6 (backups) | P1 Crit | 3 | 8.75 | 2.917 |
|
||
| 7 | T2 (VPN) | P2 High | 4 | 12.75 | 3.188 |
|
||
| 8 | T1 (email) | P1 Crit | 6 | 18.75 | 3.125 |
|
||
|
||
**Practical WSJF** (priority-class-first, SPT within class):
|
||
|
||
| Pos | Ticket | Priority | Hours | Completion |
|
||
|-----|--------|----------|-------|------------|
|
||
| 1 | T6 (backups) | P1 Crit | 3 | 3 |
|
||
| 2 | T1 (email) | P1 Crit | 6 | 9 |
|
||
| 3 | T7 (printers) | P2 High | 2 | 11 |
|
||
| 4 | T2 (VPN) | P2 High | 4 | 15 |
|
||
| 5 | T5 (software) | P3 Med | 1 | 16 |
|
||
| 6 | T3 (laptop) | P3 Med | 2 | 18 |
|
||
| 7 | T8 (archive) | P4 Low | 0.25 | 18.25 |
|
||
| 8 | T4 (wallpaper) | P4 Low | 0.5 | 18.75 |
|
||
|
||
**Comparison:**
|
||
|
||
| Metric | SPT | Practical WSJF | Winner |
|
||
|--------|-----|----------------|--------|
|
||
| Unweighted mean completion | **6.56 hrs** | 13.63 hrs | SPT |
|
||
| P1 mean time to resolution | 13.75 hrs | **6 hrs** | WSJF |
|
||
| P2 mean time to resolution | 9.25 hrs | **13 hrs** | SPT |
|
||
| Time to fix email server | 18.75 hrs | **9 hrs** | WSJF |
|
||
| Time to fix database backups | 8.75 hrs | **3 hrs** | WSJF |
|
||
| Time to update wallpaper | **0.75 hrs** | 18.75 hrs | SPT |
|
||
|
||
The aggregate priority-weighted completion times are nearly identical
|
||
(PWCT: 10.2 vs 10.17) because aggregation hides distributional damage.
|
||
The real difference is in the **per-priority-class** breakdown: the email
|
||
server is down for 18.75 hours under SPT versus 9 hours under WSJF. The
|
||
database backups fail for 8.75 hours versus 3.
|
||
|
||
The unweighted metric confidently reports SPT as **more than twice as
|
||
efficient** (6.56 vs 13.63), rewarding the team that updated desktop
|
||
wallpaper while the email server was on fire.
|
||
|
||
### 6.4 Recommended Metric Suite
|
||
|
||
Even priority-weighted aggregate metrics can fail to distinguish good from
|
||
bad schedules, because aggregation hides distributional damage. No single
|
||
metric suffices. A complete measurement system should track:
|
||
|
||
| Metric | What it measures | Formula |
|
||
|--------|-----------------|---------|
|
||
| **Mean completion by priority class** | Per-class responsiveness | $\bar{C}$ filtered by $q$ |
|
||
| **P1 mean time to resolution** | Critical incident response | $\bar{C}$ for $q = 1$ |
|
||
| **Throughput** | Raw work capacity | Work-hours completed / calendar time |
|
||
| **Aging violations** | Starvation prevention | Tasks exceeding SLA by priority |
|
||
| **Max completion time (P1/P2)** | Worst-case critical response | $\max(C_i)$ for $q \le 2$ |
|
||
|
||
The key insight: **per-priority-class metrics** expose scheduling failures
|
||
that aggregate metrics hide.
|
||
|
||
---
|
||
|
||
# Part III: Organizational Dynamics
|
||
|
||
## 7. When the Metric Is the Product
|
||
|
||
Sections 2–6 assume that client satisfaction is a function of *experienced
|
||
service quality*. But there exists a scenario in which this assumption
|
||
fails and the entire argument collapses.
|
||
|
||
### 7.1 The Self-Referential Metric
|
||
|
||
Suppose the provider reports the unweighted mean directly to the client
|
||
— on a dashboard, in an SLA report, on a marketing page — and the
|
||
client's satisfaction is derived primarily from *that number*:
|
||
|
||
$$U_{\text{client}} = f\!\left(\bar{C}(\sigma)\right), \quad f' < 0$$
|
||
|
||
Under this model, SPT genuinely maximizes client satisfaction (Theorem 1).
|
||
Throughput is unchanged (Theorem 6). The business outcome improves: same
|
||
work done, happier client.
|
||
|
||
**Every theorem in this paper remains mathematically correct. But the
|
||
conclusion inverts.** The metric is no longer a proxy that can be gamed —
|
||
it *is* the service quality, because the client has agreed to evaluate
|
||
quality by the aggregate number.
|
||
|
||
### 7.2 The Economics
|
||
|
||
This creates a coherent, stable equilibrium:
|
||
|
||
| Actor | Behavior | Outcome |
|
||
|-------|----------|---------|
|
||
| Provider | Optimizes unweighted mean (SPT) | Metric improves, no extra work |
|
||
| Client | Reads dashboard, sees low average | Reports satisfaction |
|
||
| Management | Sees satisfied client + good metric | Rewards team |
|
||
|
||
The provider extracts satisfaction at zero marginal cost, by optimizing a
|
||
number the client has accepted as a proxy for quality.
|
||
|
||
### 7.3 The Fragility
|
||
|
||
This equilibrium is stable only as long as the client never inspects their
|
||
own experience. It breaks when:
|
||
|
||
1. **The client checks their own ticket.** A CTO whose email server was
|
||
down for 18.75 hours will not be reassured by "Average resolution:
|
||
6.56 hours." The clients most likely to inspect are exactly the ones
|
||
receiving the worst service (Theorem 4).
|
||
|
||
2. **A competitor offers per-ticket SLAs.** "P1 resolved within 4 hours"
|
||
beats "average resolution under 7 hours" for any client with critical
|
||
needs.
|
||
|
||
3. **The team internalizes the metric.** If the team believes the metric
|
||
reflects real performance, they lose the ability to recognize when
|
||
critical work is neglected. The metric becomes an epistemic hazard.
|
||
|
||
### 7.4 The General Pattern
|
||
|
||
This pattern — proxy replaces quality, proxy is optimized, quality
|
||
diverges, system is stable until tested by reality — recurs across domains.
|
||
Muller [19] documents it extensively as "metric fixation"; Campbell [24]
|
||
formalized the corrupting effect of using indicators as targets.
|
||
|
||
| Domain | Proxy metric | Underlying quality | Divergence |
|
||
|--------|-------------|-------------------|------------|
|
||
| IT support | Avg. resolution time | Critical system uptime | Server down 19 hrs, avg says 6.5 |
|
||
| Education | Test scores | Actual learning | Teaching to the test |
|
||
| Healthcare | Patient throughput | Patient outcomes | Faster discharges, higher readmission |
|
||
| Finance | Quarterly earnings | Long-term value | Cost-cutting inflates EPS, erodes capability |
|
||
| Software | Velocity (story points) | Product quality | Point inflation, features half-finished |
|
||
|
||
### 7.5 Information Asymmetry
|
||
|
||
Model the system as a game between provider (P) and client (C). P observes
|
||
individual $\{C_i\}$ and chooses $\sigma$; C observes only
|
||
$\bar{C}(\sigma)$. This is a **moral hazard** problem [10]: P's optimal
|
||
strategy is to minimize the observable signal regardless of the
|
||
unobservable distribution.
|
||
|
||
The equilibrium is a **pooling equilibrium** [9]: P's reported metric
|
||
looks identical regardless of the underlying priority-weighted performance.
|
||
It is stable until C obtains access to individual $C_i$ values — via a
|
||
customer portal, a competitor's transparency, or a sufficiently painful
|
||
incident.
|
||
|
||
### 7.6 The Uncomfortable Conclusion
|
||
|
||
The honest answer to "does optimizing the unweighted mean hurt the
|
||
business?" is: **not necessarily, as long as the client never looks behind
|
||
the number**. The honest answer to "is this sustainable?" is: it is
|
||
exactly as sustainable as any system in which the seller knows more than
|
||
the buyer — stable for extended periods, then rapid collapse when the
|
||
asymmetry is punctured.
|
||
|
||
---
|
||
|
||
## 8. The Psychological Cost of Knowing
|
||
|
||
Section 7 modeled the provider as a unitary actor. But teams are composed
|
||
of individuals. When a team member understands the proof — when they
|
||
*know* the metric is synthetic, that the dashboard is theater, that the
|
||
email server is still down while they close wallpaper tickets — a new cost
|
||
appears that the equilibrium model omitted.
|
||
|
||
### 8.1 The Hidden Variable: Team Awareness
|
||
|
||
| Actor | Observes individual $C_i$ | Observes $\bar{C}$ | Understands the proof |
|
||
|-------|--------------------------|--------------------|-----------------------|
|
||
| Management | Possibly | Yes | Varies |
|
||
| Team member | **Yes** | Yes | **Yes** (in this scenario) |
|
||
| Client | No | Yes | No |
|
||
|
||
The team member has full information. They see the ticket queue. They know
|
||
the email server has been down since 7 AM. They know they are closing a
|
||
wallpaper ticket because it improves the number. And they know *why*.
|
||
|
||
### 8.2 Cognitive Dissonance Under Full Information
|
||
|
||
Cognitive dissonance [11] arises when an individual holds contradictory
|
||
cognitions. Without understanding *why*, the contradiction can be
|
||
rationalized: "management knows best." Understanding the proof removes
|
||
the ambiguity. The team member now holds:
|
||
|
||
- **Cognition A:** "I am a competent professional. My job is to solve
|
||
important problems."
|
||
- **Cognition B:** "I am closing a wallpaper ticket while the email
|
||
server is down, because the metric is mathematically biased (Theorem 1),
|
||
the reordering produces zero throughput (Theorem 6), and the only
|
||
beneficiary is the dashboard (Section 7). I can prove this."
|
||
|
||
The dissonance is now *load-bearing*. The available resolutions — abandon
|
||
professional identity, reject the proof, advocate for change, or leave —
|
||
each impose costs that did not exist before.
|
||
|
||
### 8.3 Self-Determination Theory: Three Needs Violated
|
||
|
||
Deci and Ryan's Self-Determination Theory [12, 13] identifies three needs
|
||
predicting intrinsic motivation:
|
||
|
||
**Autonomy.** The metric constrains choices in a way the team member
|
||
knows is mathematically suboptimal. A worker who understands the process
|
||
is provably counterproductive cannot feel autonomous following it.
|
||
|
||
**Competence.** The metric rewards *apparent* effectiveness (low $\bar{C}$)
|
||
while being invariant to *actual* effectiveness (Theorem 6). Genuine
|
||
competence — fixing the email server first — is *punished* by the metric.
|
||
|
||
**Relatedness.** The team member knows the client's email server is down.
|
||
They could help. They are instead updating wallpaper — not because it
|
||
helps anyone, but because it helps a number. The connection between work
|
||
and human impact has been severed, and the team member can see the severed
|
||
ends.
|
||
|
||
### 8.4 Moral Injury
|
||
|
||
Moral injury [16, 17] is the lasting harm caused by "perpetrating, failing
|
||
to prevent, bearing witness to, or learning about acts that transgress
|
||
deeply held moral beliefs" [17]. It has since been extended to business
|
||
settings [25]. The key distinction from burnout: **burnout is exhaustion
|
||
from doing too much. Moral injury is damage from doing the wrong thing.**
|
||
|
||
A team member who knows the email server is down, knows they should fix
|
||
it, closes a wallpaper ticket instead, and does so because the metric
|
||
requires it, is experiencing the structural conditions for moral injury.
|
||
|
||
### 8.5 Learned Helplessness and Metric Fatalism
|
||
|
||
Seligman's learned helplessness [14, 15] describes how exposure to
|
||
uncontrollable negative outcomes leads to passivity. The sequence:
|
||
|
||
1. The metric is flawed (proof understood).
|
||
2. Advocate for change.
|
||
3. Rejected ("the numbers are good, don't rock the boat").
|
||
4. Repeat with decreasing conviction.
|
||
5. Terminal state: "The metric is what it is. I'll just close tickets."
|
||
|
||
This is not laziness. It is the rational response to a system that
|
||
punishes correct behavior and rewards incorrect behavior, when the
|
||
individual lacks power to change the system.
|
||
|
||
### 8.6 The Adversarial Selection Spiral
|
||
|
||
Combining Section 7's equilibrium with the turnover dynamic:
|
||
|
||
1. Organization adopts unweighted mean. Metric looks good (SPT).
|
||
2. Aware, competent team members experience psychological costs (8.2–8.5).
|
||
3. Those members leave. Replaced by members who do not understand the
|
||
metric's flaws or do not care.
|
||
4. The metric continues to look good — it always does under SPT,
|
||
regardless of team competence (Corollary 6.1).
|
||
5. Actual service quality degrades, but the metric cannot detect this
|
||
(Corollary 9.1).
|
||
6. Return to step 1.
|
||
|
||
The metric selects *against* the people who would improve the system and
|
||
*for* the people who will not challenge it. The system stabilizes at a
|
||
lower level of competence, invisible to its own measurement apparatus.
|
||
|
||
### 8.7 The Complete Cost Model
|
||
|
||
| Section 7 (visible) | Section 8 (hidden) |
|
||
|---------------------|---------------------|
|
||
| Client satisfied (good number) | Team dissatisfied (bad reality) |
|
||
| Throughput unchanged | Discretionary effort withdrawn |
|
||
| Metric improves | Competent members leave |
|
||
| Business economy stable | Institutional competence degrades |
|
||
|
||
These operate on different timescales: the equilibrium is visible
|
||
quarterly; the competence degradation is visible over years. The complete
|
||
model is: **the metric works, and it is destructive, and the destruction
|
||
is invisible to the metric.** The metric is fresh paint on corroded rebar.
|
||
|
||
---
|
||
|
||
## 9. Manager Internalization: The Actionable Solution
|
||
|
||
Sections 2–6 say reject the metric. Section 7 says the metric works
|
||
(for the business). Section 8 says it destroys the team. In practice,
|
||
most managers cannot unilaterally change the metric. The best solution is
|
||
company-wide metric reform. The *actionable* solution is what a single
|
||
informed manager can do right now.
|
||
|
||
### 9.1 The Strategy
|
||
|
||
A manager who understands the proof can **internalize the metric's
|
||
limitations without propagating them to the team**:
|
||
|
||
1. **Schedule primarily by priority.** The team works critical tasks first.
|
||
2. **Tactically interleave small tasks.** When a small low-priority task
|
||
can be completed without materially delaying high-priority work, do it.
|
||
Not because the metric demands it, but because it also needs to get
|
||
done and costs almost nothing.
|
||
3. **Never reveal the metric as the motivation.** "Knock out this quick
|
||
one while we wait for the vendor callback on the P1" — not "we need
|
||
to bring our average down." The team's intrinsic motivation remains
|
||
intact (Section 8). The manager absorbs the metric-management burden.
|
||
|
||
### 9.2 Formalization
|
||
|
||
The manager's problem is a constrained optimization:
|
||
|
||
$$\min_{\sigma} \sum_{i=1}^{n} w(q_i) \cdot C_i \quad \text{subject to} \quad \bar{C}(\sigma) \le \bar{C}_{\text{target}}$$
|
||
|
||
**Theorem 12 (Bounded Metric Cost of Priority Scheduling).** A manager
|
||
who uses SPT *within* each priority class and priority ordering *between*
|
||
classes will produce a metric close to the SPT-optimal value — the gap
|
||
arises only from between-class inversions.
|
||
|
||
**Proof sketch.** Within each priority class, SPT is free (all tasks have
|
||
equal priority). The only deviation from global SPT is the between-class
|
||
ordering. Each cross-class inversion costs at most
|
||
$p_{\text{large}} - p_{\text{small}}$ in the unweighted sum, and these
|
||
inversions are bounded by the number of classes. In practice, the gap is
|
||
typically within 10–20% of SPT-optimal. $\blacksquare$
|
||
|
||
### 9.3 The Manager as Information Barrier
|
||
|
||
| Layer | Sees metric | Sees priorities | Sees proof |
|
||
|-------|-----------|----------------|------------|
|
||
| Organization | Yes | Nominally | No |
|
||
| Manager | Yes | Yes | **Yes** |
|
||
| Team | No (shielded) | Yes | Irrelevant |
|
||
| Client | Yes (dashboard) | Via SLA | No |
|
||
|
||
The manager is the only actor holding all three pieces of information.
|
||
This is not manipulation — they are doing the right work in the right
|
||
order, and the metric happens to be acceptable because within-class SPT
|
||
is free.
|
||
|
||
### 9.4 The Competitive Breakdown
|
||
|
||
This strategy fails when the metric becomes **competitive between teams**.
|
||
|
||
**Case 1: Cooperative** — Teams measured for parity, not ranking. Each
|
||
manager independently uses the internalization strategy. The metric is
|
||
decorative but harmless. This is a **coordination game** with a stable
|
||
cooperative equilibrium.
|
||
|
||
**Case 2: Competitive** — Teams ranked by $\bar{C}$. This is a
|
||
**prisoner's dilemma**:
|
||
|
||
| | Team B: Priority-first | Team B: SPT |
|
||
|---|---|---|
|
||
| **Team A: Priority-first** | (Good work, Good work) | (A looks bad, B looks good) |
|
||
| **Team A: SPT** | (A looks good, B looks bad) | (Both look good, both do wrong work) |
|
||
|
||
The Nash equilibrium is (SPT, SPT). The internalization strategy is a
|
||
cooperative equilibrium that is **not stable under competition**.
|
||
|
||
### 9.5 Scope
|
||
|
||
| Condition | Viability |
|
||
|-----------|-----------|
|
||
| Metric used for health-check / parity | **Viable** |
|
||
| Metric visible but not ranked | **Viable** |
|
||
| Metric ranked across teams | **Fragile** — requires all managers to cooperate |
|
||
| Metric tied to compensation / resources | **Not viable** — prisoner's dilemma dominates |
|
||
| Metric reform possible at org level | **Unnecessary** — fix the metric instead |
|
||
|
||
**The best solution is company-wide. The actionable solution is a manager
|
||
who understands this proof, shields their team from the metric, schedules
|
||
by priority, and uses SPT only within priority classes to keep the number
|
||
reasonable.**
|
||
|
||
---
|
||
|
||
# Part IV: Assessment
|
||
|
||
## 10. Devil's Advocate
|
||
|
||
Intellectual honesty requires acknowledging where the argument has limits.
|
||
|
||
### 10.1 Simplicity Has Real Value
|
||
|
||
**Argument.** The unweighted mean requires no priority weights, no
|
||
task-size estimates, no calibration.
|
||
|
||
**Assessment: True.** But the unweighted metric does not avoid assumptions
|
||
— it *hides* them by implicitly setting all weights to 1 and all sizes to
|
||
1. A known-imprecise estimate of task size is still more informative than
|
||
the implicit assumption that all sizes are equal.
|
||
|
||
### 10.2 Minimizing the Number of People Waiting
|
||
|
||
**Argument.** SPT minimizes total person-hours spent waiting. If each
|
||
task represents one client, this is optimal.
|
||
|
||
**Assessment: Mathematically correct.** If you run a DMV and every
|
||
person's time is equally valuable, SPT is the right policy. It breaks
|
||
down when tasks are not 1:1 with clients, waiting cost is not uniform,
|
||
or the metric is used to evaluate teams rather than serve a literal queue.
|
||
|
||
### 10.3 SPT as a Triage Heuristic
|
||
|
||
**Argument.** When task sizes cluster tightly, SPT approximates FIFO
|
||
and the unweighted mean approximates the weighted mean.
|
||
|
||
**Assessment: Correct.** The coefficient of variation $CV = \sigma_p / \bar{p}$ determines distortion severity:
|
||
|
||
| $CV$ | Task size distribution | Distortion |
|
||
|------|----------------------|------------|
|
||
| < 0.3 | Tight (call center) | Negligible |
|
||
| 0.3 – 1.0 | Moderate (mixed IT) | Moderate |
|
||
| > 1.0 | Wide (typical IT queue) | Severe |
|
||
|
||
A typical IT desk spans 15 minutes to 40+ hours ($CV > 2$). The
|
||
distortion is not an edge case — it is the default.
|
||
|
||
### 10.4 Gaming Requires Malice
|
||
|
||
**Argument.** The theorems show the metric *can* be gamed, not that it
|
||
*will* be gamed.
|
||
|
||
**Assessment: This is the strongest counterargument.** If the metric is
|
||
purely informational and never influences behavior, the gaming incentive
|
||
is absent. However, any metric reported to management, tied to OKRs, or
|
||
discussed in retrospectives will influence behavior. This is Goodhart's
|
||
Law [6, 7] — and it applies to well-intentioned teams as reliably as to
|
||
cynical ones. The drift happens organically: completing three easy tickets
|
||
"feels productive" while the metric validates the feeling.
|
||
|
||
### 10.5 When the Unweighted Mean Is Defensible
|
||
|
||
The metric is defensible **only when all four conditions hold**:
|
||
|
||
1. Task sizes are approximately uniform ($CV < 0.3$)
|
||
2. No priority differentiation (all tasks equally important)
|
||
3. Each task represents exactly one client
|
||
4. The metric is not used to evaluate, reward, or direct behavior
|
||
|
||
These conditions are rarely met in the systems where the metric is most
|
||
commonly used.
|
||
|
||
---
|
||
|
||
## 11. Related Work
|
||
|
||
This paper sits at the intersection of several literatures that have not
|
||
previously been connected.
|
||
|
||
### 11.1 Scheduling Theory and Fairness
|
||
|
||
Smith [1] established the SPT optimality result and the WSJF rule in 1956.
|
||
Conway, Maxwell, and Miller [2] provided the comprehensive textbook
|
||
treatment. The fairness of size-based scheduling policies has been debated
|
||
in computer systems scheduling: Bansal and Harchol-Balter [22] investigated
|
||
SRPT unfairness; Wierman and Harchol-Balter [23] formalized fairness
|
||
classifications against Processor-Sharing; Angel, Bampis, and Pascual [21]
|
||
measured SPT schedule quality against fair optimality criteria.
|
||
|
||
This prior work analyzes fairness in CPU and server scheduling. The present
|
||
paper applies the same mathematical results to *organizational task
|
||
management*, where the "scheduler" is a human team, the "jobs" are client
|
||
requests with business-impact priorities, and the "objective function" is
|
||
a management metric. The mechanism is identical; the consequences differ
|
||
because organizational scheduling has priority systems, client
|
||
relationships, and psychological costs that CPU scheduling does not.
|
||
|
||
### 11.2 Measurement Dysfunction
|
||
|
||
Austin [18] proved that incomplete measurement — measuring only a subset
|
||
of relevant dimensions — creates incentives to optimize the measured
|
||
dimensions at the expense of unmeasured ones, and that this effect is not
|
||
merely possible but *inevitable* when measurement is tied to rewards. His
|
||
information-asymmetry framing closely parallels Section 7. The present
|
||
paper provides the specific mathematical mechanism (Theorems 1–2) for the
|
||
case of task scheduling, and extends the argument through psychology
|
||
(Section 8) to trace the complete chain of organizational harm.
|
||
|
||
Muller [19] documented "metric fixation" across education, healthcare,
|
||
policing, and finance, providing extensive empirical evidence for the
|
||
patterns theorized in Section 7.4. Campbell [24] formalized the corrupting
|
||
effect of using indicators as targets, complementing Goodhart's original
|
||
observation [6] and Strathern's generalization [7].
|
||
|
||
Bevan and Hood [26] empirically documented gaming behaviors in the English
|
||
public health system — including the exact patterns of "hitting the target
|
||
and missing the point" described in our Section 5.2.
|
||
|
||
### 11.3 Psychological Costs of Metric Dysfunction
|
||
|
||
The application of moral injury (Shay [16], Litz et al. [17]) to business
|
||
settings has recent precedent: a 2024 *Journal of Business Ethics* study
|
||
[25] explicitly extended the construct to for-profit workplaces, finding
|
||
structural conditions similar to those described in Section 8.4. Moore
|
||
[27] analyzed moral *disengagement* — the cognitive restructuring that
|
||
enables unethical behavior under organizational pressure. The present
|
||
paper addresses the complementary phenomenon: the harm to individuals who
|
||
*refuse* to disengage.
|
||
|
||
### 11.4 What Is Novel
|
||
|
||
The individual components — SPT optimality, Goodhart's Law, measurement
|
||
dysfunction, moral injury — all have precedent. The contributions of this
|
||
paper are:
|
||
|
||
1. **The conservation law (Theorem 2) used prescriptively** — as a
|
||
constructive argument that work-weighted completion time *cannot* be
|
||
gamed, rather than as a theoretical scheduling result.
|
||
|
||
2. **The specific proof that priority classes make the metric algebraically
|
||
adversarial** (Theorems 8–9) — not merely empirically bad but
|
||
structurally contradictory, with zero mutual information between the
|
||
schedule and the priority system.
|
||
|
||
3. **The integrated chain** from mathematical proof through information
|
||
asymmetry through psychological harm through adversarial selection
|
||
spiral — tracing a single metric from Smith (1956) to organizational
|
||
hollowing.
|
||
|
||
4. **The manager internalization strategy** (Section 9) with formal
|
||
game-theoretic analysis of its stability and breakdown conditions
|
||
under inter-team competition.
|
||
|
||
5. **The application of scheduling theory to organizational management
|
||
critique** — proving that a commonly used team metric has specific,
|
||
quantifiable pathologies rather than arguing from anecdote or
|
||
general principle.
|
||
|
||
---
|
||
|
||
## 12. Conclusion
|
||
|
||
The unweighted average completion time is a **biased statistic** that:
|
||
|
||
1. **Can be gamed** by scheduling policy (Theorem 1), unlike work-weighted
|
||
completion time which is schedule-invariant (Theorem 2).
|
||
2. **Incentivizes starvation** of large tasks (Theorem 3).
|
||
3. **Degrades client satisfaction** with zero compensating productivity
|
||
gain (Theorem 7).
|
||
4. **Actively contradicts priority systems** by carrying zero information
|
||
about business-impact classification (Theorem 9).
|
||
5. **Ignores priority entirely** in its scheduling recommendation,
|
||
producing suboptimal priority-weighted delay whenever priority and
|
||
size are not perfectly inversely correlated (Theorem 10).
|
||
|
||
A metric that can be improved by reordering work — without doing any
|
||
additional work — is measuring the scheduling policy, not the system's
|
||
capacity. When combined with a priority system, it recommends the schedule
|
||
that inflicts the most damage on the highest-priority work.
|
||
|
||
When the metric is reported to clients, it creates an information asymmetry
|
||
(Section 7) whose business equilibrium is profitable but fragile. When
|
||
team members understand its flaws, it violates their intrinsic motivation
|
||
and selects for the departure of the most competent people (Section 8).
|
||
A single informed manager can partially mitigate these effects through
|
||
constrained optimization (Section 9), but this cooperative strategy is
|
||
not stable under inter-team competition.
|
||
|
||
The unweighted mean is defensible only under narrow conditions
|
||
(Section 10.5): uniform task sizes, no priorities, one-to-one client-task
|
||
mapping, and no behavioral influence. These conditions are rarely met.
|
||
|
||
**Unweighted average completion time is not a fair or accurate measurement
|
||
of task execution performance. Its adoption as a team metric will
|
||
rationally produce starvation of complex work, violation of stated
|
||
priorities, inequitable client outcomes, and the illusion of productivity
|
||
where none exists.**
|
||
|
||
The best solution is organizational metric reform. The actionable solution
|
||
is a manager who understands this proof.
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
### Scheduling Theory
|
||
|
||
[1] Smith, W. E. (1956). Various optimizers for single-stage production.
|
||
*Naval Research Logistics Quarterly*, 3(1–2), 59–66.
|
||
doi:[10.1002/nav.3800030106](https://doi.org/10.1002/nav.3800030106)
|
||
|
||
> Origin of the SPT optimality result (Theorem 1), the weighted completion
|
||
> time rule $w_i/p_i$ descending (WSJF, Theorem 11), and the adjacent-job
|
||
> pairwise interchange (exchange argument) proof technique used throughout.
|
||
|
||
[2] Conway, R. W., Maxwell, W. L., & Miller, L. W. (1967). *Theory of
|
||
Scheduling*. Addison-Wesley.
|
||
|
||
> Standard textbook treatment of single-machine scheduling theory,
|
||
> extending Smith's results.
|
||
|
||
[3] Little, J. D. C. (1961). A proof for the queuing formula: L = λW.
|
||
*Operations Research*, 9(3), 383–387.
|
||
doi:[10.1287/opre.9.3.383](https://doi.org/10.1287/opre.9.3.383)
|
||
|
||
> First rigorous proof of Little's Law. Referenced in Section 3.2 for
|
||
> queueing-theoretic context.
|
||
|
||
[4] Little, J. D. C. (2011). Little's Law as viewed on its 50th
|
||
anniversary. *Operations Research*, 59(3), 536–549.
|
||
doi:[10.1287/opre.1110.0941](https://doi.org/10.1287/opre.1110.0941)
|
||
|
||
> Retrospective discussing scope, limitations, and common misapplications.
|
||
|
||
[5] Reinertsen, D. G. (2009). *The Principles of Product Development
|
||
Flow: Second Generation Lean Product Development*. Celeritas Publishing.
|
||
ISBN: 978-0-9844512-0-8.
|
||
|
||
> Popularized WSJF and "Cost of Delay / Duration" in agile/lean contexts.
|
||
> Mathematical foundation is Smith (1956) [1].
|
||
|
||
### Measurement and Incentives
|
||
|
||
[6] Goodhart, C. A. E. (1984). Problems of monetary management: The U.K.
|
||
experience. In *Monetary Theory and Practice* (pp. 91–121). Macmillan.
|
||
|
||
> Source of Goodhart's Law: "Any observed statistical regularity will tend
|
||
> to collapse once pressure is placed upon it for control purposes."
|
||
|
||
[7] Strathern, M. (1997). 'Improving ratings': Audit in the British
|
||
university system. *European Review*, 5(3), 305–321.
|
||
doi:[10.1002/(SICI)1234-981X(199707)5:3<305::AID-EURO184>3.0.CO;2-4](https://doi.org/10.1002/(SICI)1234-981X(199707)5:3%3C305::AID-EURO184%3E3.0.CO;2-4)
|
||
|
||
> Generalized Goodhart's Law: "When a measure becomes a target, it ceases
|
||
> to be a good measure."
|
||
|
||
### Behavioral Economics
|
||
|
||
[8] Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of
|
||
decision under risk. *Econometrica*, 47(2), 263–292.
|
||
doi:[10.2307/1914185](https://doi.org/10.2307/1914185)
|
||
|
||
> Established loss aversion. Referenced in Section 4.5.
|
||
|
||
### Game Theory and Contract Theory
|
||
|
||
[9] Akerlof, G. A. (1970). The market for "lemons": Quality uncertainty
|
||
and the market mechanism. *The Quarterly Journal of Economics*, 84(3),
|
||
488–500. doi:[10.2307/1879431](https://doi.org/10.2307/1879431)
|
||
|
||
> Information asymmetry and adverse selection. The pooling equilibrium in
|
||
> Section 7.5 is structurally analogous.
|
||
|
||
[10] Hölmstrom, B. (1979). Moral hazard and observability. *The Bell
|
||
Journal of Economics*, 10(1), 74–91.
|
||
doi:[10.2307/3003320](https://doi.org/10.2307/3003320)
|
||
|
||
> Formal treatment of moral hazard. The metric-reporting scenario in
|
||
> Section 7.5 is a moral hazard problem.
|
||
|
||
### Psychology
|
||
|
||
[11] Festinger, L. (1957). *A Theory of Cognitive Dissonance*. Stanford
|
||
University Press. ISBN: 978-0-8047-0131-0.
|
||
|
||
> Foundational theory. Referenced in Section 8.2.
|
||
|
||
[12] Deci, E. L., & Ryan, R. M. (1985). *Intrinsic Motivation and
|
||
Self-Determination in Human Behavior*. Plenum Press.
|
||
ISBN: 978-0-306-42022-1.
|
||
|
||
> Original treatment of Self-Determination Theory. Referenced in
|
||
> Section 8.3.
|
||
|
||
[13] Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and
|
||
the facilitation of intrinsic motivation, social development, and
|
||
well-being. *American Psychologist*, 55(1), 68–78.
|
||
doi:[10.1037/0003-066X.55.1.68](https://doi.org/10.1037/0003-066X.55.1.68)
|
||
|
||
> SDT overview linking need satisfaction to intrinsic motivation and
|
||
> well-being.
|
||
|
||
[14] Seligman, M. E. P., & Maier, S. F. (1967). Failure to escape
|
||
traumatic shock. *Journal of Experimental Psychology*, 74(1), 1–9.
|
||
doi:[10.1037/h0024514](https://doi.org/10.1037/h0024514)
|
||
|
||
> Original demonstration of learned helplessness. Referenced in
|
||
> Section 8.5.
|
||
|
||
[15] Seligman, M. E. P. (1975). *Helplessness: On Depression,
|
||
Development, and Death*. W. H. Freeman. ISBN: 978-0-7167-0752-3.
|
||
|
||
> Extended treatment connecting learned helplessness to human depression
|
||
> and institutional behavior.
|
||
|
||
[16] Shay, J. (1994). *Achilles in Vietnam: Combat Trauma and the Undoing
|
||
of Character*. Atheneum / Simon & Schuster. ISBN: 978-0-689-12182-3.
|
||
|
||
> Introduced the concept of moral injury. Referenced in Section 8.4.
|
||
|
||
[17] Litz, B. T., Stein, N., Delaney, E., Lebowitz, L., Nash, W. P.,
|
||
Silva, C., & Maguen, S. (2009). Moral injury and moral repair in war
|
||
veterans: A preliminary model and intervention strategy. *Clinical
|
||
Psychology Review*, 29(8), 695–706.
|
||
doi:[10.1016/j.cpr.2009.07.003](https://doi.org/10.1016/j.cpr.2009.07.003)
|
||
|
||
> Formalized moral injury as a clinical construct. Definition quoted in
|
||
> Section 8.4.
|
||
|
||
### Organizational Measurement
|
||
|
||
[18] Austin, R. D. (1996). *Measuring and Managing Performance in
|
||
Organizations*. Dorset House. ISBN: 978-0-932633-36-1.
|
||
|
||
> Proved that incomplete measurement creates inevitable incentives to
|
||
> optimize measured dimensions at the expense of unmeasured ones. The
|
||
> information-asymmetry framing closely parallels Section 7. The single
|
||
> most important predecessor to this paper's argument.
|
||
|
||
[19] Muller, J. Z. (2018). *The Tyranny of Metrics*. Princeton University
|
||
Press. ISBN: 978-0-691-17495-2.
|
||
|
||
> Comprehensive treatment of "metric fixation" across education,
|
||
> healthcare, policing, and finance. Extensive empirical evidence for the
|
||
> patterns theorized in Section 7.4.
|
||
|
||
### Scheduling Fairness
|
||
|
||
[20] Coffman, E. G., Shanthikumar, J. G., & Yao, D. D. (1992).
|
||
Multiclass queueing systems: Polymatroid structure and optimal scheduling
|
||
control. *Operations Research*, 40(S2), S293–S299.
|
||
|
||
> Conservation laws in scheduling. The schedule-invariance of
|
||
> work-weighted completion time (Theorem 2) is an instance of these
|
||
> conservation laws.
|
||
|
||
[21] Angel, E., Bampis, E., & Pascual, F. (2008). How good are SPT
|
||
schedules for fair optimality criteria? *Annals of Operations Research*,
|
||
159(1), 53–64. doi:[10.1007/s10479-007-0267-0](https://doi.org/10.1007/s10479-007-0267-0)
|
||
|
||
> Directly measures SPT schedule quality against fairness criteria.
|
||
> Closest predecessor in scheduling theory to Section 4's fairness
|
||
> analysis.
|
||
|
||
[22] Bansal, N., & Harchol-Balter, M. (2001). Analysis of SRPT
|
||
scheduling: Investigating unfairness. *ACM SIGMETRICS Performance
|
||
Evaluation Review*, 29(1), 279–290.
|
||
doi:[10.1145/384268.378792](https://doi.org/10.1145/384268.378792)
|
||
|
||
> Investigates the belief that SRPT unfairly penalizes large jobs in
|
||
> computer scheduling. Argues unfairness is smaller than believed but
|
||
> acknowledges the core tension.
|
||
|
||
[23] Wierman, A., & Harchol-Balter, M. (2003). Classifying scheduling
|
||
policies with respect to unfairness in an M/GI/1. *ACM SIGMETRICS
|
||
Performance Evaluation Review*, 31(1), 238–249.
|
||
|
||
> Formalizes fairness definitions for scheduling policies by comparison
|
||
> to Processor-Sharing.
|
||
|
||
### Additional References
|
||
|
||
[24] Campbell, D. T. (1979). Assessing the impact of planned social
|
||
change. *Evaluation and Program Planning*, 2(1), 67–90.
|
||
doi:[10.1016/0149-7189(79)90048-X](https://doi.org/10.1016/0149-7189(79)90048-X)
|
||
|
||
> Campbell's Law: "The more any quantitative social indicator is used for
|
||
> social decision-making, the more subject it will be to corruption
|
||
> pressures and the more apt it will be to distort and corrupt the social
|
||
> processes it is intended to monitor." Complements Goodhart's Law [6].
|
||
|
||
[25] Ferreira, C. M., et al. (2024). It's business: A qualitative study
|
||
of moral injury in business settings. *Journal of Business Ethics*.
|
||
doi:[10.1007/s10551-024-05615-0](https://doi.org/10.1007/s10551-024-05615-0)
|
||
|
||
> Extends moral injury to for-profit workplaces. Validates Section 8.4's
|
||
> application of Shay/Litz beyond military and healthcare settings.
|
||
|
||
[26] Bevan, G., & Hood, C. (2006). What's measured is what matters:
|
||
Targets and gaming in the English public health care system. *Public
|
||
Administration*, 84(3), 517–538.
|
||
doi:[10.1111/j.1467-9299.2006.00600.x](https://doi.org/10.1111/j.1467-9299.2006.00600.x)
|
||
|
||
> Empirically documents gaming behaviors including "hitting the target
|
||
> and missing the point." Provides real-world evidence for Section 5.2's
|
||
> priority-metric contradiction.
|
||
|
||
[27] Moore, C. (2012). Why employees do bad things: Moral disengagement
|
||
and unethical organizational behavior. *Personnel Psychology*, 65(1),
|
||
1–48. doi:[10.1111/j.1744-6570.2011.01237.x](https://doi.org/10.1111/j.1744-6570.2011.01237.x)
|
||
|
||
> Analyzes moral *disengagement* — the cognitive restructuring enabling
|
||
> unethical behavior. Section 8 addresses the complementary phenomenon:
|
||
> harm to individuals who *refuse* to disengage.
|
||
|
||
---
|
||
|
||
*This proof was developed conversationally and formalized on 2026-03-28.*
|