Behind the scenes: Are prediction markets good for anything?

Clara Collier interviews Dan Schwarz about his newest Asterisk piece.

In this behind-the-scenes interview, Clara Collier talks to Dan Schwarz, author of our latest essay, Are Prediction Markets Good for Anything?

Clara and Dan talk about how Dan's thinking evolved as he wrote the piece, the case for continuing optimism despite the casino problem, the (waning?) importance of the "market" in "prediction market," why it seems like Metaculus is just better, and more.

Clara Collier: Dan, you've been involved with the prediction-market world for a very long time. Is there anything unexpected you found when you were doing the data analysis for this piece? Did anything contradict your expectations?

Dan Schwarz: The major line of inquiry that I had — and that you had — was "are we getting the public goods that people have predicted?" I basically confirmed the view that I had when I started, which was "no."

Others had written this as well, and Scott Alexander had made the exact same point. I was very happy to do the research to try to verify this and to really dig deep into the data and figure it out, but I didn't fundamentally change my mind on the superficial view, which is: If you look at these prediction markets and you see what's there and you see what the traders are doing and you see what the news is doing and you see what people are seeing on those websites — they are not as promising as people would have predicted if you had simply been told 20 years ago that "there will be prediction markets on public topics with millions of viewers and billions of dollars exchanging hands." I think everyone would have predicted more public value coming out of it.

That said, on basically every detail of that I learned something, and on some things I definitely updated my views. One thing I updated was from the Asterisk editors, which is on the taxonomy itself. What types of public value are even possible?

I originally categorized "early warning" as one of the main categories and I think you correctly noticed that, almost by construction, these prediction markets were not really capable of doing early warning. What I had found was monitoring of risks that had been identified by other people and were now being tracked in these prediction markets. I was confused about this, because I had first learned about COVID-19 from Metaculus in February of 2020.

Clara: Wow.

Dan: So from my lived experience, it was an early warning of "hey, there's this big thing going on." And just by the nature of being on Metaculus, I learned that I made significant life plans that turned out to work very much in my favor. And I thought, "great, if I was able to benefit from this much, just think about all these other millions of people." But the way the prediction markets look now, they're really only interesting once they have quite a lot of trading volume.

So almost by definition, it has to be a publicly known thing before it disseminates out into the news and makes people aware. So no one's going to learn about some sort of oil embargo Iran thing going on from Polymarket. Generally speaking, that's not where the news is being broken, but you can still learn a lot more about a pressing risky issue from the prediction market.

Does money matter?

Clara: This actually relates to one of the big questions I had going into all of this, which is: Are any of these better than Metaculus? For reader context, Metaculus is a prediction aggregation site. It's not a market. There's no money involved, and there are some other subtle differences, but it's also consistently pretty well-calibrated.

Dan: I think it is more accurate, but I can't prove it with the data I have.

This is hard because different questions have different difficulties. You can see this in the literature on forecasting questions going back many years. Different tournaments will sometimes report participants' calibration scores. If you just look at those numbers naively, you think you're evaluating the absolute accuracy, but it's just a function of what the questions were in that tournament or on that platform or on that prediction market. And it varies wildly.

For example, on questions about the future of AI — which are some of the hardest questions to get — the best accuracy that people are able to achieve is not that much better than random chance, whereas on macroeconomics forecasters are closer to perfect oracles. And Metaculus tends to skew very hard scientific questions and prediction markets tend to skew very easy gossipy.

Even when I filtered down to the questions I considered interesting, which is kind of the core conceit of the article, they are significantly easier than the questions on Metaculus on average. And so it's hard to run head-to-head. But if you just take that into account mentally, just adjust for that, then Metaculus questions are kind of scarily accurate for the difficulty of those questions.

Don't hold your breath waiting for new rules

Clara: For years and years the promise of prediction markets has been "once we have real money then we'll get real accuracy." This whole thing is so disappointing, even leaving aside the casino elements and other social harms. I wish there was something more exciting here.

Dan: There definitely still could be. We are in the early days. Academic experts might plausibly say it's going to take longer for the markets to start performing better.

It's true there's been a lot of liquidity for maybe 18 months now on Polymarket, maybe six months on Kalshi, but in terms of an academic looking at this, that's still not very much time. You and I talked about the history of financial markets, and I ended up not really studying it for this piece, but I have the sense that the first 12 to 18 months after there was a stock market I don't think you were getting very many good things coming out of it.

Clara: Matt Levine wrote about this recently. He was talking about how a lot of smart institutional money avoids prediction markets because there are risk-hedging mechanisms that exist in mature financial markets that prediction markets don't have.

Dan: I did not read that, but that's consistent with what he's written earlier on it. And that makes sense. There is — and you and I talked about this briefly in scoping out this article as well — many institutional things that go on in financial markets that maybe the casual prediction market user or observer may not know about.

But when you look at these in financial markets and you look at the absence of them in prediction markets, it's not that surprising that prediction markets are not working great as markets right now. It just takes time to get that infrastructure in place.

Clara: Are you optimistic about that infrastructure evolving?

Dan: The thing that's tricky about it is that we're so dominated by two players: Polymarket and Kalshi. One of the things I find in the article is that for markets that are actually interesting and plausibly useful, it's even more dominated by just Polymarket, something like eight times as much volume on Polymarket (on questions that I think can help people) versus Kalshi and so to be optimistic about the financial infrastructure of prediction markets is basically a claim about one company and what they might do.

And if you look at the track record of that company, Polymarket, they have invested significantly into the crypto infrastructure of their platform. Not my field, but I know there's quite a lot of sophistication there and basically none of the stuff that you would expect from normal financial markets.

And so putting my forecaster hat on, I would predict that Polymarket will continue doing what they've already been doing. So no, I would not expect to get the kind of normal insider trading rules, capital risk controls, all the various things that make financial markets smooth. I would not expect those to show up in Polymarket anytime soon.

Inside Clara are two wolves

Clara: I feel like there's just an important lesson here. In a way, it rhymes, and is also directly contiguous with, the arc of the rationalist movement as a whole.

Dan: Say more.

Clara: I am a rat. I like rats. But the early movement was so focused on building up tools for thought, the art of human rationality. And that is still there to an extent, but it's really faded away as an explicit focus in favor of more object-level concerns about AI. And I'm not so sure this is a bad thing. I don't want to undersell the activity of trying to do reasoning better, but it just ends up, I think, mattering less than the emergent social dynamics of the community you find yourself in. Will they criticize bad arguments? Do they understand probability? All that seems more important than coming up with some exciting new mechanism or technique.

Dan: I would certainly agree with that. Having been involved in the rationalist community for almost 15 years ago now, I definitely was attracted for that same reason. It was largely about epistemics: What is true and how do we know that and what is the set of practices and institutions to get there?

And I agree, over the last 15 years I've come more to view the truth like a social construct. I'm more like "there is truth, and there are methods of finding it, but the mechanisms that people generally use are so laden with social context and norms that the main things that would help have more to do with those norms." And I think that is part of the optimistic case for prediction markets.

Prediction markets are already changing the common-sense view about how to get information on what's going on in the news and that is very significant. Again, it's not really directly leading to much truth right now, but maybe that norm shift ultimately will turn out to be more important than just getting certain facts better faster.

Clara: I'm of two minds about this. The Puritan part of me wants to say: Is the norm that the news is something that you relate to as a gambling app…good? Do we want that outcome? And the other wolf inside me says: Getting people to think intuitively in terms of probability and uncertainty — that has to be useful.

Dan: The fact that there is probabilistic reasoning at all in a news article is a massive change. You just don't generally get that. I started reading The Economist a couple of years ago, and I really liked it because I felt like they'd have charts, they would have confidence intervals in them, and they would have some forecast and would have a 10% case and a 90% case and a median case.

It felt like they were reasoning about multiple outcomes and our job here is to try to figure out how things that are happening are shifting the distribution in one direction or another. And I felt at the time like I was only getting that from The Economist. I wasn't getting that even from very well-reported things in the mainstream press.

It's extremely hard to figure out what's going on. And I believe this now more after researching this than I did before. I put in the footnotes some of the news articles I found that were most prominently placing an actual probability in the headline even more than a date.

The more that I spend in forecasting, the more I prefer date and numeric forecasts to probability forecasts. I want to know when something is going to happen.

But probabilities have this very nice property that you cannot process the number 58% without thinking about it as a probability, whereas if I tell you something will happen in June 2027 minus X months, you can kind of just pretend that that's a fact about the world, even though it is just a number out of a distribution. So 58% means nothing unless you were thinking probabilistically. To see that in the headlines of major news articles — to me, that is a big change that I think many folks in the epistemics, the rationality, and prediction-market community are very happy to see.

Unsolicited advice for Polymarket and Kalshi

Clara: As someone who's run prediction markets, if you could give any advice to people running Polymarket and Kalshi about how to make them better epistemic tools, what would it be? That's probably not what they want — they're there to make money — but if they were asking you, what would you say?

Dan: By far the easiest thing — and I really do encourage them to do this, and I know there are people who are maybe mutual acquaintances of ours who are in their Discords asking them to do this — is just to write better questions.

Part of the data science that I did for this piece was just sifting through a lot of questions. And there really are some good interesting questions on those platforms that have attracted a large volume and really, it makes everybody happy. Those guys are getting paid, the traders are getting some fun gambling, and the public or policymakers or academics are all learning something useful about the world. So our incentives are aligned in that and the main thing holding that back is simply not having the creativity and the willpower to just write more good questions.

Ultimately they're there to serve their users and they want their bettors to be happy. But there is a significant Venn diagram overlap between things that will totally make the bettors happy and things that are interesting and useful and good. And they should just spend more time working and writing those questions and administrating them.

Clara: It's surprising how hard this is. I've also heard this from folks at Metaculus who consistently say their biggest bottleneck is coming up with good questions.

Dan: Yep. There's no question that it is one of the major bottlenecks. It's a bottleneck to academic research — for example, for the Forecasting Research Institute for them to be able to run good studies. Writing and resolving and administrating good questions is a bottleneck for them as well. And even for AI development. Being able to understand how good at various types of forecasting, judgment, research, and reasoning various AIs are is a hard thing.

My company, FutureSearch, has been working on this and trying to publish some stuff to advance this and I know many other folks are working on it too. Again, not so much Polymarket as far as I can tell. I think they might have hired people to work on it but I don't really see much coming out of them indicating they're taking this seriously. It would be very easy for them to do and I highly encourage them to do it.

The art of writing a good question

Clara: Do you want to expand a little on why writing good questions is so hard?

Dan: There was a tournament announced by Mantic, another AI forecasting company. It's a tournament about question-writing, not about forecasting, about trying to see who can write the best questions and one of the key ways that they can tell that a question is good is that it causes good forecasters to give different predictions.

The main failure mode in most questions is that they are too trivial. They ask questions where, after 30 minutes of looking into it, there's not really much more that you can say. And so all good forecasters will kind of converge to the same thing. Is the U.S. going to have some recession? Just Google it. It's very easy to see the consensus of economists on things like that.

To me, a good forecasting question is one where a good forecaster — which can be a human, or a team of humans, or an AI system — the more that they research, the more that they update. Their view will fluctuate until they get to some conclusion that was not so obvious when they started researching. How exactly to make questions that are like that — that's one of the properties I think is most important.

Definitely one of the things from writing this essay that was surprising to me is like, "boy, we're so close to that promise of prediction markets especially for what I care about, which is AI. We're so close to having this great information that is what everybody wants."

Clara: I'm going to ask a more cynical question, which is: Is that potential worth it? Does it justify the gambling and the political insider trading and everything else? How do you think about it holistically?

Dan: As I write in the piece, my sense is that the value of prediction markets is rapidly decreasing because of the value that you can get out of pure AI systems that have no market structure and are not calibrated forecasters. Just ask something directly to Claude and you will get a pretty good answer now.

And that has been improving so quickly that whatever the costs are for providing these prediction markets — whether it's gambling, addiction, insider trading, government regulations, just the opportunity cost of all that money exchanging hands, all those employees, all that infrastructure — it does feel like the value is shifting away from that and towards just conventional chatbots that people have even for free if you don't even want to pay for the $20 to get the better answer. And so I'm not sure if it's worth it now.

I mentioned a reason for optimism — both in this conversation with you and in the piece — which is that prediction markets could change norms around how people think about uncertainty and where their evidence even comes from and I think that could be potentially very valuable.

But in terms of just getting better information — in terms of "I just want better epistemics, I just wanted better information, and I want it to be credible, and I want there to be a mechanism behind that that is trustworthy" — I'm increasingly thinking, "no, it's not worth it" and what we really need is to just get the AI systems that we're all using every day to be better at various epistemic things and forecasting. Research how to judge things, how to deal with uncertainty, how to communicate uncertainty, and things like that.

It really feels to me that in five years people are just going to be getting this from their AIs no matter what prediction markets are doing, so I think it is a central irony that prediction markets are not at all based on AI and don't need AI in any part of their operation — but they are finally taking off right at the same time that AI is becoming extremely good at exactly the same thing that prediction markets are doing.

The "market" in "prediction market"?

Clara: This is a question I did want to ask and didn't have time to get into the piece, which is: The whole idea of prediction markets is that it's an information aggregator. AI is not doing that. What makes them good at prediction?

Dan: Well, they are information aggregators in the sense that when they are being trained, they are reading everything on some Iran geopolitical thing and synthesizing it. They are training themselves to predict the next word in some news article about what's going on in Iran. And they are using all of the other updates they got from all the other news articles about what's going on in Iran, plus everything that they've learned about the last 10 times something happened in Iran.

A parallel that I like to think about, because I talk to a lot of elite forecasters, is you take an elite forecaster who doesn't really know anything about the topic and you just ask them, "hey, what do you think about what's going on in Iran?" or "what do you think is going on with crypto regulation? what do you think is going on with AI progress at some company?"

And they can just kind of aggregate. Generally, when we say aggregation we mean multiple people, but one individual person is also aggregating information across many sources. They've read many forecasts, they're aggregating across evidence and across time, and it's being synthesized in their brain and then output to you again. That's not generally what people think about but now that we have these AIs that are kind of anthropomorphized and you kind of talk to as if they were humans. It's just much more obvious how much aggregation is just going into the pre-training and the post-training of these models. You can ask it five times and take the mean or they can go out and just read five articles and synthesize across the five.

But I think your question is great and that is when we think about the fundamental value of prediction markets. Why is having this group of people betting against each other the right way of getting that information, when you have other aggregation methods like training a large language model which — and, again, I'm stretching it here — is some form of aggregation?

Then you do have to ask which form of aggregation is better. To your earlier question, Metaculus is just a different method of aggregating human intelligence. It doesn't use betting and it doesn't use markets and it is better in some ways. It's generally more accurate but it is much slower to react and so it's much more out of date and various other things like that.

It's true in the prediction market community that there has been a sense that markets are the best way to aggregate any information, that there's nothing you could ever do that will be better than just having market prices clear and have people bet on outcomes. That's the end-all of aggregation. And I think Metaculus already showed that, at least for forecasting, that's not necessarily true. And then AI for me is saying "no no no no there's many ways of aggregating disparate information and you can study them by scoring these forecasts" and it is far from clear that prediction markets are the best way to do that even though that's kind of their calling card.

Clara: I think that's a good place to leave the interview, thank you so much.

Dan: Thank you, Clara. I really appreciate both the chance for this interview and for writing the piece.

Behind the scenes: Are prediction markets good for anything? (6 minute read)