Ludwig Boltzman, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on the work, died similarly in 1933. Now it is our turn to study statistical mechanics. Perhaps it will be wise to approach the subject cautiously. — (Opening lines of "States of Matter", by D.L. Goodstein)
I hope to write a whole series about the Boltzmann distribution, but I can't do that without introducing it. It's one of the most fundamental results in thermodynamics, right up there with the idea that entropy should always increase.
To the point: if you have a system at temperature \( T \), then the probability weight of each state of energy \( E \) is
$$ p(E) \propto \exp(-E/T). $$
You may have seen this before. If you have, try to appreciate how absurdly simple it is. Notice how we didn't need to know what the system was made of, or how the energy states were distributed. It doesn't matter the size, shape, or composition of the system. All that matters is that it can freely exchange energy (and for now, only energy) with the outside world.
There are many ways to arrive at the Boltzmann distribution. The bog-standard one is to consider the system in contact with a large thermal bath, do some math, and conclude that the probability weight associated with each state is proportional only to the number of states of the bath. That's why the probability weight for each state of the same energy is the same.
This distribution doesn't care about the details of the system or the different energy states. It describes any thermodynamic system in equilibrium, whether it's made of dust, electrons, clay, or water. It doesn't matter how the energy states are distributed. In all cases, the distribution is the same, and the entire thermodynamics comes from this one simple thing.
There are standard ways of proving this, and they're not hard. The most common one checks all the boxes.
$$ p(E_i)p_R(E-E_i) = p(E_j)p_R(E-E_j) \rightarrow p(E_i) \propto \exp(-\beta E) $$
But I'll be honest: I don't think of temperature in terms of reservoirs and subsystems and the microcanonical ensemble.
This confused me. When I'm confused about a physics problem, I often dial up joshphysics. He's like my red phone for theory problems.
For the last little while we've been talking about the so-called fluctuation theorem.
When I got fired from DoltHub, my to-do list shrank to zero overnight, but instead of downshifting, my brain started working overtime. It decided that this was the time to pick up the loose threads that I had left off at UCLA. At the time I was studying things related to entropy and phase space, but I had very confused ideas about what those things were, and the pressure to get new "results" on my research every week prevented me from ever digging into them.
One thing I was confused about was entropy, but confusion being what it is, I wasn't even totally sure what I was confused about. I just started skimming pages on Wikipedia, and picked things up as I went along. But now that I had the time to sit down and challenge some of the articles I was reading (instead of immediately applying them to get "results") I realized I found many of them unconvincing. For one, every single "proof" of the second law of thermodynamics involved making some pretty big assumptions that did not at all seem justifiable to me.
It was around this point that I realized, in sort of a flash, that I did not have a clue why time moved forwards instead of backwards. And that's a humbling thing to realize.
So I pulled out the red phone and called Josh. It turned out that Josh didn't understand why time moved forwards instead of backwards either, and made me feel quite at home in my confusion about entropy. But being well-versed in what other people had to say, he immediately pointed me to something called the fluctuation theorem from exactly 30 years ago. The fluctuation theorem claims to resolve a lot of problems related to the direction of time. For example, it's our experience that ice melts, but there's actually no rule that stops nature from suddenly turning the water back into a block of ice. That's because the laws of physics work perfectly well backwards as they do forwards. Yet when you consider large numbers of particles together, like molecules in a block of ice, a clear direction of time emerges: ice melts, and never unmelts.
But pre-fluctuation theorem, no one could prove that. Now we know, at least in principle, that ice melting is exponentially more likely than ice unmelting. (It was this exponential relationship that they discovered.) The theorem, it turns out, is maybe ten lines of algebra total. Those ten lines might have stopped Boltzmann from killing himself. If only he had been a little smarter.
The problem is, after a couple weeks of reading the paper and studying the fluctuation theorem — and by studying I mean all day doing nothing but thinking about it and nothing else — I didn't understand it at all. In fact I think I understood it less than before I even read it. It's totally not obvious, which is maybe why despite being rather simple, it took more than a hundred years to discover it.
Plus, along the way towards (not) understanding the fluctuation theorem, I've encountered many other theorems and results I thought I understood, but didn't. I've begun to feel like I've built my house of knowledge by starting with the roof, which is somehow perched on rotten pylons. One of those pylons is the Boltzmann distribution, which is something everybody who takes chemistry or physics learns about in year one, and something that you feel so much pressure to understand that everyone convinces themselves that they do.
Entire fields of physics are built around simple ideas. Relativity is based around the idea that "the laws of physics over here are the same as over there." Quantum physics is built around the idea that "matter behaves like light." In statistical mechanics, the simple idea is that "an equilibrium system is equally likely to be in any state compatible with the laws of physics." The result of this assumption is the Boltzmann distribution, which says that the probability of a specific state of a system depends only on its energy, and
$$ p(E) \propto \exp(-\beta E). $$
It takes a while to appreciate how absurd this result is. You're telling me that the probability only depends on the energy, and that it scales like an exponential? How is this intuitive? How is this possible? It's shocking to me.
A motif in physics is that simple equations can usually be distilled down to simple arguments: "Ah, but if I simply observe [x] it all becomes astonishingly clear" I imagine saying to myself in my naive 19th-century voice. God would I kill for that feeling right now. I'm starting with the Boltzmann distribution, which seems simple enough, but in every proof I discover there's always some detail that eludes my understanding, which blocks me from that sweet feeling of completion and release. ("Perhaps it will be wise to approach the subject cautiously...")
There's something fascinating about temperature and the Boltzmann distribution. We have an intuition that if we put two things together that are the same temperature, that they'll remain the same temperature. There's a notion of an invariance argument here: two systems at temperature \( T\) stay at \( T \) whether they're in separate or together. There's another fact which is that we have the intuition that a system's temperature doesn't change if you rotate, translate it, wait a little bit, or even shift all the energy levels of the system (by, say, raising it in a gravitational field.) Each of these arguments (as you'll see below) result in the Boltzmann distribution. But why? Aren't those two totally separate kinds of invariances?
Nevertheless, my goal is to find these simple arguments (or invent them -- though it's hard to say something new in a field that's 150 years old) and then synthesize them, hopefully, into an understanding for why the probability distribution takes this exponential shape, and hopefully with some deep intuition for it. After that, next stop: fluctuation theorem.
After this, it's all equations. If you don't have your ticket, you should get off the train.
An argument that states of equal energy have equal relative probability
A major assumption of statistical mechanics is that a system flows through space space so that it goes close to every point. (It does not go through every point.) What's the intuition for this? Consider this non-rigorous, but intuitive, line of thought: instead of thinking of phase space (restricted to energy \(E\)) as a surface, think about it as a graph. Instead of points \( (p, q) \), there are vertices \( v_i \).
In the full phase space there can't be any loops, because the equations of motion are reversible — therefore, the only way to get into a loop is to start in one. (Plus, these states take up "zero area" on the phase space surface.) The way this translates to the graph is that as we jump from vertex to vertex we never end up in a cycle. The only way to get through the graph without creating cycles is to go through a bunch of vertices either once or never at all — no vertices are visited twice. If they were, they'd create a cycle. The vertices that are never visited are "unvisitable" in principle — they are disjoint from the rest of the graph, so we can ignore them.
The point is that in this model, the system has to visit every vertex once and only once, giving some credibility to the "equal probabilities" postulate. The more vertices you have, the more phase space you can credibly represent.
An argument that the probability only depends on energy
TBD
An argument for the exponential form of the Boltzmann distribution: no characteristic energy
Imagine you have a box in thermal equilibrium with, say, the air. The box and the air can exchange energy with each other, and the box's energy will change here and there. You want to know the probability of observing the box being in a specific energy state \( E_i \). The probability that the box is in a specific energy state is
$$ P(E_i) = \frac{f(E_i)}{\sum_i f(E_i)}. $$
Lift the box off the ground a few feet. The energy of the box goes up by \( m g h \). But this shouldn't change \( P(E_i) \), since that determines the thermodynamics of the system, and those don't change. Therefore, all the \(f(E_i)\) must have changed by a constant amount: \( f(E_i + m g h) = c(h) f(E_i) \). If this is true for all \( h \) then \( f(E_i) \) must be the exponential function, and therefore also
$$ P(E) \propto \exp(-\beta E). $$
Credit here.
Another argument for the same: composition
Imagine you have two systems, \(A\) and \(B\), that are entirely separate but at the same temperature \( T \). Suppose that \(A\) is in a microstate with energy \(E_i\) and \(B\) in a microstate with energy \( E_j \). You can view this as a single system \( A B\), or as two separate systems, \( A \) and \(B\). Therefore, if the probability that a system is in a specific energy state is
$$ P(E_i) = \frac{f(E_i)}{\sum_i f(E_i)}, $$
then
$$ \frac{f(E_i) f(E_j)}{Z_A Z_B} = \frac{f(E_i + E_j)}{Z_{AB}}. $$
We can rescale \( f(E_i) \) by a constant factor, so choose \( f(0) = 1 \). This gives \( Z_A Z_B = Z_{AB} \). Then \( f(E_i) f(E_j) = f(E_i + E_j) \) and is the exponential function, so that
$$ P(E) \propto \exp(-\beta E). $$
Another argument: weak coupling
Suppose you have a box in contact with a large thermal reservoir, and they're weakly coupled, and the entire system has a fixed energy \( E \). Make the assumption that all energy states are equally likely, that is, \( P_S(E) = \text{const.} \) ("S" for system). Consider two different microstates of the system, one where the box has energy \( E_i \), and the other where the box has energy \( E_j \). Then
$$ P_S(E) = P_B(E_i)P_R(E - E_i) = P_B(E_j)P_R(E - E_j). $$
Rewrite this as
$$ \frac{P_B(E_i)}{P_B(E_j)} = \frac{P_R(E - E_j)}{P_R(E - E_i)}. $$
Write \( P_R(E-E_i \)) as
$$ P_R(E-E_i) = e^{-E_i\frac{d}{dE}}P_R(E) = e^{-E_i\frac{d}{dE} \ln P_R(E)} $$
so that
$$ \frac{P_B(E_i)}{P_B(E_j)} = e^{-\beta (E_i - E_j)} $$
where we call \( \frac{d}{dE} \ln P_R(E) = \beta \) which makes sense: \( 1 / P_S(E) = \Omega(E)\) (the count of states) and \( \ln\Omega(E) \) is the microcanonical entropy \( S \). So you get the equation \( dS/dE = \beta \).
Another argument: detailed balance
This system is in contact with the outside world and can borrow energy from it or give energy back to it, so it fluctuates from one state to another, potentially one which has more or less energy. (When we say "state," we mean a specific snapshot of the system, or "microstate," and not an energy state, which can be made up of multiple states.) We assume that the transition probability \( t \) of fluctuating from a state of energy \(E_i\) to a state of energy \( E_j \) is determined solely by the difference in energy of the two states, so that \( t(E_i, E_j) = t(E_j - E_i)\).
Finally, assume the system is at equilibrium. Reaching equilibrium means that although the system can move from state to state (and all possible states are allowed) that the probability that a system occupies any state has stopped changing. That means that the net probability flow from state to state is also zero. (Outside of equilibrium the probability is still in the process of spreading out.) The way we're talking about this system, it's pretty much like a Markov walk, and I think that's a good picture to have in mind.
To flesh this out, take two specific states of this equilibrium system, \(A\) and \(B.\) (A transition is allowed, since we've assumed that the transition probability only depends on their energies.) For the net probability flow to be zero, it must be the case that the transition \( A \rightarrow B \), given the system starts at \( A\), is just as likely as a downward transition from \( B \rightarrow A \) given the system starts at \(B\). Balancing the probability flows between the states gives
$$ p(E_A) t(E_B - E_A) = p(E_B) t(E_A - E_B). $$
We can rewrite this more clearly if we take \( E_A = E \), \( E_B = E + \Delta E\), and rearrange slightly,
$$ \frac{p(E + \Delta E)}{p(E)} = \frac{t(\Delta E)}{t(-\Delta E)}. $$
Importantly, the right-hand side does not depend on the absolute energy, but only on the energy difference \( \Delta E\) between the two states. Therefore for any energy \( E \) the relative probability decreases as a constant ratio that depends only on the energy difference. The only function has decreases as a constant ratio is the exponential function,
$$ p(E) \propto \exp(-\beta E). $$
We have to choose a minus sign because energies are bounded from below and not above, and probabilities can't increase forever. The positive constant \( \beta \) is a new degree of freedom that determines how quickly the \(p(E) \) decreases. We identify \( 1/\beta \) with the temperature. Though its value is indeterminate, we try to understand what it means by going back and reasoning about about the transition probabilities.
For example, if \( t(\Delta E) /t(-\Delta E) = 1\), the system is just as likely to transition up as it is to transition down. This implies \( p(E + \Delta E) = p(E) \) for any \( \Delta E \). This corresponds to \( \beta = 0 \) or, equivalently, infinite temperature. The system has no specific preference for any energy state, and moves through them entirely freely.
If we take another extreme case \( t(\Delta E) / t(-\Delta E) = 0 \), the probability of transitioning upward to any energy that isn't the ground state is also zero. This corresponds to \( \beta = \infty \), or zero temperature. What we mean by temperature is intimately related to the transition probabilities between states.
Conclusion
Tune in later for that. -->