What people say GLP‑1 side effects feel like (and why Reddit data isn’t the same thing as safety data)

Drug labels are written in a language that is both necessary and incomplete.

Necessary, because medicine needs shared terms, thresholds, and definitions. Incomplete, because the lived experience of a side effect is often a story about timing, intensity, and trade‑offs—things that don’t compress neatly into a table.

GLP‑1 receptor agonists, and newer incretin combinations like tirzepatide, sit right at that tension. These drugs now have millions of users, which means two things become true at once: formal trial evidence matters more than ever, and the gap between “what trials captured” and “what people experience day to day” becomes more visible.

A 2026 medRxiv preprint tries to map that gap using an unconventional dataset: Reddit posts.

The authors analyzed 410,198 posts (from May 2019 through June 2025) that mention semaglutide or tirzepatide, identified 67,008 users who self‑reported taking one of the drugs, and then summarized which side effects those users talked about (preprint).

The headline finding is not surprising if you’ve followed the class: gastrointestinal symptoms dominate. The more interesting part is the framing question the paper forces: what can a massive corpus of self‑report tell us that trials and labels might miss, and what can it never tell us?

What this study is actually measuring

If you want to read this preprint cleanly, the first move is to name the unit of analysis.

This is not an adjudicated adverse event registry. It’s not pharmacovigilance in the classical sense where a clinician confirms timing, dose, competing causes, and seriousness.

It’s closer to a measurement of salience.

In plain terms, it helps answer: “When people who say they’re using these drugs talk online, what symptoms do they choose to write about?”

That is a different question than “What is the incidence of side effect X in the population?”

The difference matters because online discussion is shaped by who posts, what they feel comfortable describing, what communities normalize, and which symptoms have a narrative hook. If nausea ruins your ability to work, you may post. If a subtle lab abnormality shows up on routine testing, you may never mention it. If a symptom is stigmatized, it may be under‑reported. If a symptom has become a meme, it may be over‑represented.

So treat this preprint as a map of conversation that may contain early clues—not as a substitute for trials, labels, or formal reporting systems.

The dataset and the basic numbers

The authors report analyzing 410,198 Reddit posts that mention semaglutide or tirzepatide. From those, they identified 67,008 users who self‑reported taking the medications.

Among those self‑reported users, they report that 43.5% described at least one side effect.

Then they rank symptom categories by how often they were mentioned.

In this dataset, the most commonly described side effects were:

Nausea (36.9%), fatigue (16.7%), vomiting (16.3%), constipation (15.3%), and diarrhea (12.6%).

If that list feels familiar, that’s because it overlaps strongly with what clinicians already expect from incretin‑based therapies: slowed gastric emptying and altered gut signaling are part of how these drugs work, and they also explain why GI tolerability becomes the limiting factor for many users.

The value of seeing those numbers in Reddit data is not in “discovering” nausea. It’s in quantifying just how dominant the GI conversation is in a large, naturalistic corpus.

The under-discussed clusters the authors highlight

The authors flag two clusters that stood out in their analysis as “unrecognized potential effects”: reproductive symptoms (for example, menstrual irregularities) and temperature‑related complaints (for example, chills and hot flashes) (preprint).

This is the point where it’s easy to slide into the wrong conclusion.

A cluster appearing in self‑report does not prove the drug caused it. People start GLP‑1 drugs during periods of rapid weight change, dietary shifts, altered sleep, new exercise patterns, and sometimes other medication changes. Any of those can plausibly affect menstruation, perceived temperature regulation, or autonomic symptoms.

But it would also be a mistake to dismiss the clusters as irrelevant.

One legitimate use of this kind of analysis is hypothesis generation: it can tell researchers and clinicians which experiences are common enough, or important enough to patients, that they might be worth asking about explicitly in prospective studies.

If you’ve ever watched safety conversations evolve, you’ll recognize the pattern. Some effects become well‑documented only after enough people report them, and after clinicians begin to look for them systematically. Social data can act as an early nudge toward “maybe we should measure this,” even when it can’t settle causality.

Why Reddit cannot give you incidence (and why people still want it to)

When a paper says “X% of users reported Y,” the human brain wants to read that as incidence.

But the denominator in this study is not “all people taking semaglutide.” It’s “people who posted about semaglutide on Reddit and also self‑reported taking it.” That is a very specific slice.

People who post may be more likely to have questions, problems, or strong experiences. Communities may attract people who are struggling or troubleshooting. There is also the fundamental ambiguity of self‑report: the drug name might be wrong, the formulation might differ, the timing might be off, and comorbid conditions are rarely controlled for.

There’s also a second kind of distortion: narrative selection.

Some side effects are “talkable.” Some are not. Some are easy to describe. Some require lab testing. Some feel relevant to the identity story people are telling online (“I’m doing GLP‑1 to change my life”). That selection pressure shapes what ends up in posts.

So the cleanest way to translate the results is: this is a ranked list of symptom themes that show up in discussion, not a reliable estimate of how often a symptom occurs.

What this kind of paper is good for

Despite the limits, there are real reasons to pay attention.

First, it can surface what patients are preoccupied by. That matters because patient experience affects adherence, dose escalation decisions, and whether people seek medical advice or instead “solve” problems through unvetted internet strategies.

Second, it can identify vocabulary mismatches. Patients may describe symptoms in ways that don’t map neatly onto clinical terms. Social data can show the language people naturally use, which can be useful when designing patient‑reported outcome instruments.

Third, it can spotlight candidate clusters for prospective measurement. Even if a signal is confounded, it can still be clinically useful to ask: are temperature complaints common during initiation? Do menstrual cycle changes track with weight loss rate, caloric intake, stress, or medication timing? Are they different by drug?

Finally, it can function as an early warning system in the weak sense: it can tell you what people are noticing. Formal systems are still needed to determine what is true.

How to use this as a reader (without over-reading it)

A reasonable way to hold this preprint is as a “symptom attention heatmap.” It shows where a large online community puts its attention when discussing semaglutide and tirzepatide.

The safest interpretation is not “these drugs cause X% chills.” It’s “some people are describing chills and hot flashes often enough that it might be worth asking about more directly.”

If you want the incidence and severity profile, that still lives primarily in randomized trials, observational studies with defined populations, and regulatory documents. (For example, the Drugs@FDA page for tirzepatide products links to the current label PDFs and their adverse reaction sections: NDA 215866.)

The bigger takeaway is about how modern medicine gets narrated. When a drug becomes a cultural object, the safety conversation no longer happens only in journals and labels. It happens in millions of tiny stories.

Studies like this one don’t replace the old systems. They help us notice what the old systems may not have been optimized to hear.