Snapshot Isolation

In a snapshot isolated system, each transaction appears to operate on an independent, consistent snapshot of the database. Its changes are visible only to that transaction until commit time, when all changes become visible atomically to any transaction which begins at a later time. If transaction $T_1$ has modified an object x, and another transaction $T_2$ committed a write to x after $T_1$’s snapshot began, and before $T_1$’s commit, then $T_1$ must abort.

Snapshot isolation is a transactional model: operations (usually termed “transactions”) can involve several primitive sub-operations performed in order. It is also a multi-object property: operations can act on multiple objects in the system.

Snapshot isolation cannot be totally available; in the presence of network partitions, some or all nodes may be unable to make progress. For total availability, at the cost of allowing long-fork anomalies, consider parallel snapshot isolation, or (weaker, but more commonly supported) read committed.

Unlike serializability, which enforces a total order of transactions, snapshot isolation only forces a partial order: sub-operations in one transaction may interleave with those from other transactions. The most notable phenomena allowed by snapshot isolation are A5B Write Skew, which allow transactions to read overlapping state, modify disjoint sets of objects, then commit; and a read-only transaction anomaly, involving partially disjoint write sets.

Snapshot isolation implies read committed. However, it does not impose any real-time constraints. If process A completes write w, then process B begins a read r, r is not necessarily guaranteed to observe w. Some databases provide real-time variants of snapshot isolation. Compare with Strict Serializability , which provides a total order and real-time guarantees.

Moreover, snapshot isolation does not require a per-process order between transactions. A process can observe a write, then fail to observe that same write in a subsequent transaction. In fact, a process can fail to observe its own prior writes, if those writes occurred in different transactions. To enforce that transactions from the same process appear to execute in order, consider prefix-consistent snapshot isolation.

<aside> 💡 not requre a per-process order between transactions

即使在同一个 process 内也不保证任何跨事务的顺序，一个 process 的后一个事务的读可能看不见上一个事务的写。

</aside>

In its generalized form, snapshot isolation allows Pathological Ordering . For instance, a snapshot isolated database can always return the empty state for any reads, by appearing to execute those reads at time 0. It can also discard write-only transactions by reordering them to execute at the very end of the history, after any reads. Operations like increments can also be discarded, assuming the result of the increment is never observed. Luckily, most implementations don’t seem to take advantage of these optimization opportunities.

A Range of Snapshot Isolations

Papers vary in how strictly they constrain snapshot isolation’s allowed histories. The original paper by Berenson et al is somewhat ambiguous about the relationship between commit and start timestamps and wall-clock time, and subsequent work by Daudjee & Salem established “classic/strong” and “generalized/weak” SI levels. Adya’s formalism for SI assumes that we take as given specific choices made by the scheduler: whether a history is SI or not depends on what the scheduler does. This gives rise to a range of possible snapshot isolations depending on how the scheduler is implemented.

If one interprets SI as enforcing that transactions always choose a start time higher than every committed transaction’s commit time, then snapshot isolation is clearly incomparable to serializability: it forces a real-time order which Serializability does not.

Following extensive conversation with Bailis, Crooks, Fekete, Hellerstein, Sutra, and Shapiro, Jepsen interprets snapshot isolation in a more relaxed way: we take SI to prohibit dependency cycles between transactions which contain more than one adjacent read-write anti-dependency edge. Under this generalized definition, snapshot isolation is strictly weaker than Serializability . This definition is compatible with a broader set of implementations.

<aside> 💡 "edge" refers to a relationship between two transactions, where one transaction reads or writes a value that another transaction has written.

</aside>

<aside> 💡 "Adjacent" refers to the relationship between two edges that are next to each other in a sequence of operations.

</aside>

Formally

Berenson, Bernstein, et al first defined snapshot isolation in terms of an abstract algorithm:

… each transaction reads reads data from a snapshot of the (committed) data as of the time the transaction started, called its Start-Timestamp. This time may be any time before the transaction’s first Read. A transaction running in Snapshot Isolation is never blocked attempting a read as long as the snapshot data from its Start-Timestamp can be maintained. The transaction’s writes (updates, inserts, and deletes) will also be reflected in this snapshot, to be read again if the transaction accesses (i.e., reads or updates) the data a second time. Updates by other transactions active after the transaction Start-Timestamp are invisible to the transaction.

When the transaction T1 is ready to commit, it gets a Commit-Timestamp, which is larger than any existing Start-Timestamp or Commit-Timestamp. The transaction successfully commits only if no other transaction T2 with a Commit-Timestamp in T1’s execution interval [StartTimestamp, Commit-Timestamp] wrote data that T1 also wrote. Otherwise, T1 will abort. This feature, called First-committer-wins prevents lost updates (phenomenon P4). When T1 commits, its changes become visible to all transactions whose Start-Timestamps are larger than T1‘s Commit-Timestamp.