Take a click on whatever strikes your fancy!
In the preamble to defining the catalytic model, Buhrman, Cleve, Koucky, Loff, and Speelman use the metaphor of being given a hard drive full of data to borrow for computation. Given this "real-world" scenario, what is the first not-at-all-theoretical thing you would do to get some power out of this extra memory? For most, the answer is compression; real data is often not optimized, and there's plenty of room to gain from applying existing compression software even without theoretical guarantees.
Moving to theory, we have to confront the fact that most strings $\tau$ are sadly incompressible. What do we do then? Are we back to square one? No, because an incompressible, i.e. random, string is a great resource.
Having access to a polynomial length random string is a great resource for space-bounded algorithms. For one, while the relationship between BPL and L is a hot area of study, there are many random walk-based algorithms that are much simpler (and efficient) than their deterministic counterparts. Furthermore, unlike the read-once randomness in the usual space-bounded setting, a random catalytic can be read as many times as we want, which lets us study more interesting randomized algorithms such as ones involving the isolation lemma.
A random catalytic tape is also, as it turns out, an interesting object when it comes to analyzing catalytic algorithms themselves; in particular, running a machine on the average catalytic tape is very well-behaved as compared to trying to understand the worst-case (see "structure in catalytic space" section below). Thus when trying to show the ability of "weak" catalytic models (e.g. deterministic, time-bounded, etc.) to simulate "strong" catalytic models (e.g. randomized, non-deterministic, time-unbounded, etc.), a key first step is to reduce the problem to studying a random catalytic tape.
Unfortunately, the above framework assumes we have the ability to compress non-random strings, when even distinguishing random and compressible strings is uncomputable in the most general sense and very difficult even in restricted settings. Furthermore, this compression needs to be done in-place, i.e. without the ability to store an extra copy the original string for reference while calculating the compressed representation. Lastly, in most (but not all) settings, the output of our algorithm is not allowed to depend on the catalytic tape, so if we use this string for its random properties, we still must decide the problem without any "probability" of failure.
This gives rise to a very interesting setting for compression algorithms, where we first analyze what property we really want from our random string, and then compress all strings that do not fulfill exactly this property. This connects to other settings in TCS, such as the lossy code problem in TFNP, and forces us to revisit classic randomized algorithms—again the isolation lemma is a key example—and "half-derandomize" them: rather than a probabilistic argument showing that the average string is good, we show that every string that fails is provably compressible, even by a relatively weak machine.
As per the earlier discussion, there are two existing groups of results coming from the compress-or-random framework: catalytic simulations of classical arguments and catalytic simulations of other catalytic classes.
On the classical side, there is an unpublished result from the early era of catalytic, due to Dulek and others, showing BPL is in CL; this follows by a test, due to Nisan, for when a collection of random strings gives the correct majority result for a given BPL machine. More recently, Agarwala and Mertz showed that bipartitue maximum matching is in CL by observing that a weight scheme that does not isolate one maximum matching gives two matchings of equal weight, which by definition means that the weight of any single edge not in their intersection can be recovered, and hence erased, as long as the two matchings can be found (the technicality of the paper is in this last endeavor).
With regards to catalytic-catalytic comparisons, Cook, Li, Mertz, and Pyne showed a compression scheme for the fact that catalytic machines take polynomial time in expectation over all catalytic tapes. This immediately led to showing that any function which is computable in catalytic logspace (with no time restriction) and polynomial time (with no space restriction) is computable in poly-time catalytic logspace. They also combined this with the existing compress-or-random idea in reconstructive PRGs to derandomize catalytic logspace, but this argument was subsumed and generalized by Koucky, Mertz, Pyne, and Sami, who showed a simple argument for removing both randomness and non-determinism from catalytic computation using only the expectation compression argument.
There is one other small use of compress-or-random, from the original paper of Buhrman et al., which is the oracle setting. They show that CL contains PSPACE in the right oracle setting by having the oracle perform both tasks: first, the oracle is designed to answer PSPACE-complete problems given a random string; and second, the oracle takes care of compressing and decompressing non-random strings.
Besides compression, the other natural idea of how to use and restore the catalytic tape is reversible computation. While any catalytic algorithm can always be made reversible (see "structure in catalytic space" section below), we can aim to design algorithms which are inherently reversible as well.
One such testbed is the arithmetic world, where the inverse of storing a value in memory is a more equivalent or comparable object; rather than computing and erasing, we will add and subtract values instead. This is most easily expressed as a register program, where all of our memory is treated as registers over a given field and our manipulations of memory are purely algebraic.
If our basic subroutines are adding and subtracting values to catalytic memory, we can analyze our computation around these building blocks. Let $R$ be a register in catalytic memory, initially holding a value $\tau$ and let $v$ be some value which we can add or subtract from $R$ via some pre-specified programs $P$ and $P^{-1}$. Of course if we could store $\tau$ elsewhere and compare it to the value of $R$ after running $P$, namely $\tau + v$, we would be able to use $v$ directly, but this goes against the principle of using only arithmetic operations into catalytic memory. What is there to do with $\tau$ and $\tau+v$ separately then, given that we cannot look directly at $v$?
Arithmetic provides us another crucial gem: the ability to cancel out terms. To find a moral equivalent to our above idea, if we were to subtract $R$ from some other register, say $R'$, and then run $P$ and add $R$ to $R'$, the net result would be to subtract $\tau$ and then add $\tau + v$—that is, we add $v$ to $R'$ as desired.
But this goes further. Say this situation was duplicated across two values, so we are given programs $P_1$ and $P_2$ (plus their inverses) which add (subtract) values $v_1$ and $v_2$ to registers $R_1$ and $R_2$, and from this we want to derive a function of $v_1$ and $v_2$ into a third register $R_3$. If we want to compute $v_1 + v_2$, the above strategy solves this easily; subtract $R_1$ and $R_2$ from $R_3$ before running $P_1$ and $P_2$ and add them back to $R_3$ after.
What about $v_1 \cdot v_2$? Taking the product of $R_1$ and $R_2$ does not produce $v_1 \cdot v_2$ by itself, but it does product $(\tau_1 + v_1)\cdot(\tau_2 + v_2)$, which has the term we want plus three cross terms. We now only have to remove these junk terms one at a time, which can be done with more products of $R_1$ and $R_2$ after toggling $v_1$ and $v_2$ in and out of their respective registers.
The above argument was developed by Ben-Or and Cleve in the late 80s, following up on the celebrated Barrington's Theorem, to simulate arithmetic NC$^1$ circuits. This style of proof was revived in the very first catalytic paper, due to Buhrman et al., who showed the same computation idea for majority gates, thus putting TC$^1$ in CL. Finally it was generalized to arbitrary polynomials by Cook and Mertz in a series of papers, each with increasing efficiency with regards to both register memory and the number of calls to the input program, chiefly for the purpose of showing low-space algorithms for a generalized formula model called tree evaluation. This argument was then directly used by Williams in an earthquake result, showing any time $t$ computation on a multi-tape Turing machine can be simulated in space just $\sqrt{t \log t}$.
Since the shift to arbitrary polynomials, there has been less effort spent on studying more efficient programs for specific functions (as was done in all previous papers), but there is some partial work due to Cook and Mertz as well as Alekseev, Filmus, Mertz, Smal, and Vinciguerra, both of whom showed tradeoffs between time and space efficiency to balance space concerns. The former studied arbitrary polynomials but in a different parameter regime from the tree evaluation works, while the latter specifically analyzed functions computable by low-depth circuits, showing a logarithmic saving over generalizing Buhrman et al. for catalytically computing SAC$^2$.
The only trivial upper bound on catalytic space is an equal amount of pure space. In order to improve this result, we need to look deeper into the structure of catalytic algorithms. We will borrow from two fundamental results on ordinary space: first, the straightforward fact that space $s$ algorithms can be simulated in time exp$(s)$; and second, the much more intriguing (and much more recently discovered) fact that all algorithms can be made reversible with only a constant amount of extra space.
The proof that SPACE[$s$] $\subseteq$ TIME[$2^{O(s)}$] is taught in any introductory complexity course: the number of configurations of a machine using space $s$, after accounting for tape heads and the state of the machine, is at most exponential in $s + O(\log s)$, and since in any computation no configuration can be repeated without falling into an infinite loop, this count of configurations is also an upper bound on the time of the machine.
This argument, of course, fails for CSPACE[$s,c$], since there are $2^s \cdot 2^c \cdot 2^{O(\log (s+c))}$ configurations. However, there are also $2^c$ starting configurations, one for each initial catalytic tape, and crucially for a fixed $x$, none of the paths from two different starting configurations, say $\tau_1$ and $\tau_2$, can intersect, as the end result would be that one of the two catalytic tapes would not be reset. Thus given $x$, each configuration is in at most one of the $2^c$ paths, giving $2^s \cdot 2^{O(\log(s + c))}$ configurations per path on average.
This observation is enough to show a randomized poly-time algorithm for CL: simply choose a random starting catalytic tape $\tau$ and simulate the machine for $2^{O(s)}$ steps. Not only do most choices of $\tau$ halt, but the number of $\tau$s which continue to run drops very quickly as the time bound increases.
Lange, McKenzie, and Tappe showed how to make space inherently reversible by taking an Eulerian tour around the configuration graph, i.e. walking around and away from all dead configurations hanging off the back of the path defined by running on the given $x$. With enough care and using the non-intersecting observation from the average-case argument above, this result naturally generalizes to catalytic space. Thus while we can directly design reversible catalytic algorithms (see "register programs" section above), we can also take any existing catalytic algorithm and generically transform it into one which executes some forward computation, finds the answer we are seeking, and then walks backwards step-by-step along the entire computation until it reaches its starting configuration.
The results of the above two arguments are mostly just as stated: CL is in ZPP and CL equals reversible-CL. However, these two arguments were also combined by Cook, Li, Mertz, and Pyne for the first published compression-based catalytic algorithm (see "compress or random" section above), where they used reversibility to design a compression algorithm for every catalytic tape which takes more than the expected runtime of $2^s$.