GitHub's Fake Star Economy
210 points - today at 8:26 AM
SourceComments
Are VC's just that lazy about making investment decisions? Is this yet another side-effect of ZIRP[2] and too much money chasing a return? Is nobody looking too hard in the hope of catching the next rocket to the moon?
From the outside, investing based on GitHub stars seems insane. Like, this can't be a serious way of investing money. If you told me you were going to invest my money based on GitHub stars, I'd laugh, and then we'd have an awkward silence while I realize there isn't a punchline coming.
[0] I'm from Cleveland. I get to pick on them.
[1] https://en.wikipedia.org/wiki/List_of_Cleveland_Browns_seaso... I think their record speaks for itself.
Here are the things I look at in order:
* last commit date. Newer is better
* age. old is best if still updating. New is not great but tolerable if commits aren't rapid
* issues. Not the count, mind you, just looking at them. How are they handled, what kind of issues are lingering open.
* some of the code. No one is evaluating all of the code of libraries they use. You can certainly check some!
What does stars tell me? They are an indirect variable caused by the above things (driving real engagement and third interest) or otherwise fraud. Only way to tell is to look at the things I listed anyway.
I always treated stars like a bookmark "I'll come back to this project" and never thought of it as a quality metric. Years ago when this problem first surfaced I was surprised (but should not have been in retrospect) they had become a substitute for quality.
I hope the FTC comes down hard on this.
Edit:
* commit history: just browse the history to see what's there. What kind of changes are made and at what cadence.
Build a SaaS and you'll have "journalists" asking if they can include you in their new "Top [your category] Apps in [current year]", you just have to pay $5k for first place, $3k for second, and so on (with a promotional discount for first place, since it's your first interaction).
You'll get "promoters" offering to grow your social media following, which is one reason companies may not even realize that some of their own top accounts and GitHub stars are mostly bots.
You'll get "talent scouts" claiming they can find you experts exactly in your niche, but in practice they just scrape and spam profiles with matching keywords on platforms like LinkedIn once you show interest, while simultaneously telling candidates that they work with companies that want them.
And in hiring, you'll see candidates sitting in interview farms quite clearly in East Asia, connecting through Washington D.C. IPs, present themselves with generic European names, with synthetic camera backgrounds, who somehow ace every question, and list experience with every technology you mention in the job post in their CVs already (not hyperbole, I've seen exactly this happen).
If a metric or signal matters, there is already an ecosystem built to fake it, and faking it starts to be operational and just another part of doing business.
Specifically someone submitted a library that was only several days old, clearly entirely AI generated, and not particularly well built.
I noted my concerns with listing said library in my reply declining to do so, among them that it had "zero stars". The author was very aggressive and in his rant of a reply asked how many stars he needed. I declined to answer, that's not how this works. Stars are a consideration, not the be all end all.
You need real world users and more importantly real notability. Not stars. The stars are irrelevant.
This conversation happened on GitHub and since then I have had other developers wander into that conversation and demand I set a star count definition for my "vague notability requirement". I'm not going to, it's intentionally vague. When a metric becomes a target it ceases to be a good metric as they say.
I don't want the page to get overly long, and if I just listed everything with X star count I'd certainly list some sort of malware.
I am under no obligation to list your library. Stop being rude.
It’s more expensive to compute, but the resulting scores would be more trustworthy unless I’m missing something.
Why am I not surprised big Capital corrupts everything. Also, Goodhart's law applies again: "When a measure becomes a target, it ceases to be a good measure".
HN Folks: What reliant, diverse signals do you use to quickly eval a repo's quality? For me it is: Maintenance status, age, elegance of API and maybe commit history.
PS: From the article:
> instead tracks unique monthly contributor activity - anyone who created an issue, comment, PR, or commit. Fewer than 5% of top 10,000 projects ever exceeded 250 monthly contributors; only 2% sustained it across six months.
> [...] recommends five metrics that correlate with real adoption: package downloads, issue quality (production edge cases from real users), contributor retention (time to second PR), community discussion depth, and usage telemetry.
I think as a proxy it fails completely: astroturfing aside stars don't guarantee popularity (and I bet the correlation is very weak, a lot of very fundamental system libraries have small number of stars). Stars also don't guarantee the quality.
And given that you can read the code, stars seem to be a completely pointless proxy. I'm teaching myself to skip the stars and skim through the code and evaluate the quality of both architecture and implementation. And I found that quite a few times I prefer a less-"starry" alternative after looking directly at the repo content.
We should do a hall of shame!
* https://arxiv.org/abs/2412.13459 (2024/2025) - Six Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware
As a side note it's kind of disheartening that everytime there is a metric related to popularity there would be some among us that will try to game it for profit, basically to manipulate our natural bias.
As a side note it's always a bit sad how the parasocial nature of the modern web make us like machine interfacing via simple widgets, becoming mechanical robot ourselves rationalising IO via simple metrics kind of forgetting that the map is never the territory.
In my opinion, nothing could be more wrong. GitHub's own ratings are easily manipulated and measure not necessarily the quality of the project itself, but rather its Popularity. The problem is that popularity is rarely directly proportional to the quality of the project itself.
I'm building a product and I'm seeing what important is the distribution and comunication instead of the development it self.
Unfortunately, a project's popularity is often directly proportional to the communication "built" around it and inversely proportional to its actual quality. This isn't always the case, but it often is.
Moreover, adopting effective and objective project evaluation tools is quite expensive for VCs.
Github stars is akin to 'link popularity' or 'pagerank' which is ripe for abuse.
One way around it is to trust well known authors/users more. But it's hard to verify who is who. And accounts get bought/closed/hacked.
Another way is to hand over the algo in a way where individuals and groups can shape it, so there's no universal answer to everyone.
> When nobody is forking a 157,000-star repository, nobody is using it
that is completely not true, i don't fork a repo when i use it, only when i want to contribute to it (and usually cleanup my forks)
It does feel like everything is a scam nowadays though. All the numbers seem fake; whether it's number of users, number of likes, number of stars, amount of money, number of re-tweets, number of shares issued, market cap... Maybe it's time we focus on qualitative metrics instead?
We figured out a workaround to limit activity to prior contributors only, and add a CI job that pushes a coauthored commit after passing captcha on our website. It cut the AI slop by 90%. Full write-up https://archestra.ai/blog/only-responsible-ai
I guess it's like fake followers on other social media platforms.
To me, it just reflects a behaviour that is typical of humans: in many situations, we make decisions in fields we don't understand, so we evaluate things poorly.
I'd give a lot of credit to Microsoft and the Github team if they went on a major ban/star removal wave of affected repos, akin to how Valve occasionally does a major sweep across CSGO2 banning verified cheaters.
I paid github for years to keep my repos private...
But then I don't participate in the stars "economy" anyway, I don't star and I don't count stars, so I'm probably irrellevant for this study.
It’s supposed to get people to actually try your product. If they like it, they star it. Simple.
At that point, forcing the action just inflates numbers and strips them of any meaning.
Gaming stars to set it as a positive signal for the product to showcase is just SHIT.
> Runa Capital publishes the ROSS (Runa Open Source Startup) Index quarterly, ranking the 20 fastest-growing open-source startups by GitHub star growth rate. Per TechCrunch, 68% of ROSS Index startups that attracted investment did so at seed stage, with $169 million raised across tracked rounds. GitHub itself, through its GitHub Fund partnership with M12 (Microsoft's VC arm), commits $10 million annually to invest in 8-10 open-source companies at pre-seed/seed stages based partly on platform traction.
This all smells like BS. If you are going to do an analysis you need to do some sound maths on amount of investment a project gets in relation to github starts.
All this says is stars are considered is some ways, which is very far from saying that you get the fake stars and then you have investment.
This smells like bait for hating on people that get investment
> As one commenter put it: "You can fake a star count, but you can't fake a bug fix that saves someone's weekend."
I'm curious what the research says here---can you actually structurally undermine the gamification of social influence scores? And I'm pretty sure fake bugfixes are almost trivial to generate by LLMs.
“gstack is not a hypothetical. It’s a product with real users:
75,000+ GitHub stars in 5 weeks
14,965 unique installations (opt-in telemetry, so real number is at least 2x higher)
305,309 skill invocations recorded since January 2026
~7,000 weekly active users at peak”
GitHub stars are a meaningless metric but I don’t think a high star count necessarily indicates bought stars. I don’t think Garry is buying stars for his project.
People star things because they want to be seen as part of the in-crowd, who knows about this magical futuristic technology, not because they care to use it.
Some companies are buying stars, sure, but the methodology for identifying it in this article is bad.