188
Map of 2000+ lemmy communities (danterious.codeberg.page)
submitted 3 months ago* (last edited 3 months ago) by Danterious@lemmy.dbzer0.com to c/fediverse@lemmy.world

This is my first try at creating a map of lemmy. I based it on the overlap of commentors that visited certain communities.

I only used communities that were on the top 35 active instances for the past month and limited the comments to go back to a maximum of August 1 2024 (sometimes shorter if I got an invalid response.)

I scaled it so it was based on percentage of comments made by a commentor in that community.

Here is the code for the crawler and data that was used to make the map:

https://codeberg.org/danterious/Lemmy_map

all 28 comments
sorted by: hot top controversial new old
[-] PugJesus@lemmy.world 42 points 3 months ago

What do the X and Y Axis represent?

[-] Danterious@lemmy.dbzer0.com 23 points 3 months ago* (last edited 3 months ago)

Well I used dimensionality reduction to make it 2D so the axes are how the algorithm chose to compress it.

The original data had each data point as a community and the features as a frequency of a user posting in that community.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] threelonmusketeers@sh.itjust.works 10 points 3 months ago

I used dimensionality reduction to make it 2D

Huh, interesting. So is the idea to spread the data out as much an possible, while keeping "similar" communities near each other? What was the dimensionality of the original set?

[-] Danterious@lemmy.dbzer0.com 9 points 3 months ago* (last edited 3 months ago)

Total communities: 2986

Total users: 21934

So the dimensions were reduced from (2986, 21934) to (2986, 2)

Edit: Also yeah it is using Umap for the algorithm and it does do something pretty similar to what you described.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] keepthepace@slrpnk.net 8 points 3 months ago

That's really interesting! It shows which communities share users. I am part of jlai.lu, a french-speaking community that is relatively isolated by also slrpnk.net that seems very spread out!

Would it make sense to compute the standard deviation of each instance's communities? It would give an idea of which are islands and which are more extended. Not sure if it makes sense to compute it more on 2 dimensions or on the original 21934 though.

[-] Danterious@lemmy.dbzer0.com 6 points 3 months ago

Yeah that sounds like a good idea so you can see how connected local communities are. Probably makes more sense to use original dimensions so no extra information is lost.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] m_f@midwest.social 19 points 3 months ago

What is !steamdeck@lemmy.world doing over with the red dots ๐Ÿค”

[-] Danterious@lemmy.dbzer0.com 27 points 3 months ago* (last edited 3 months ago)

Either the people in !steamdeck@lemmy.world are pretty horny or its an artifact of the dimensionality reduction and means nothing.

Edit: Actually it could also be that it just didn't collect enough data on that community and the most recent person was also active in nsfw communities. I was only able to get back 14ish days in the data for lemmy.world. They produce way to many comments and I got kicked out early.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] Omniraptor@lemm.ee 1 points 3 months ago

Maybe it would be cool to have an interactive version where you could click on any two nodes and it would tell you the actual distance

[-] Danterious@lemmy.dbzer0.com 0 points 3 months ago

Long distances actually don't really mean much it can't be guaranteed that they actually correlate to much. It is mostly the local groups that are conserved and a bit of the global structure.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] cron@feddit.org 13 points 3 months ago

This community has only two posts and a few comments. The algorithm has very few information on such tiny communities.

It would probably be useful to only include communities with a minimum amount of interaction to avoid such outliers.

[-] muntedcrocodile@lemm.ee 14 points 3 months ago

The libertarians are right next to the bigtiddygothgf lol

[-] RobotToaster@mander.xyz 13 points 3 months ago

Steamdeck is right next to animefeet

[-] Blaze@feddit.org 13 points 3 months ago

That's, great, thanks !

Reposting to !dataisbeautiful@mander.xyz

[-] otter@lemmy.ca 10 points 3 months ago* (last edited 3 months ago)

~~Would you be able to take a screenshot of the map and edit that in as the link URL? Nice thumbnails help a post be seen, and it might let people see the map when the site is getting a hug of death ๐Ÿ˜„~~

~~Then just have the website link at the top of the post~~

edit: It loaded for me, and I see why a screenshot wouldn't make sense. There's so much cool detail, thanks for sharing!

[-] Danterious@lemmy.dbzer0.com 5 points 3 months ago

I was somehow able to get both a picture and url added and it looks much better. Thx.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] aasatru@kbin.earth 9 points 3 months ago

!dadvice@lemmy.world is there right next to !bellyexpansion@lemmynsfw.com at [9.6, 38.3]. I guess someone is excited about their wives being pregnant. :)

[-] Spot@startrek.website 9 points 3 months ago

This is really awesome! I saw your post the other day about it and thought it was a great idea. You work quick! I already found a new community I would not have thought to look for otherwise.

It is hard for me to see and manipulate on mobile, but that's totally on me. So I'll be back in a bit. I'm sure someone smarter than me may have more helpful input than that if you are looking for feedback!

So I thought I was gonna head to bed, but... guess I can stay up and peruse for a little while...

Thank you!!

[-] Fizz@lemmy.nz 7 points 3 months ago* (last edited 3 months ago)

Pretty cool graph. It was funny to see the two lemmy.World porn communities in a see of lemmynsfw. I was completely unaware lemmy.World hosted porn.

[-] cabbage@piefed.social 6 points 3 months ago* (last edited 3 months ago)

Very cool!

Do you be have any idea how tolling scraping these data is for the servers?

If this is something you want to keep working on, maybe it could be combined with a sort of Threadiverse fund raiser: we collectively gather funds to cover the cost of scraping (plus some for supporting the threadiverse, ideally), and once we reach the target you release the map based on the newest data and money is distributed proportionally to the different instances.

Maybe it's a stupid idea, or maybe it would add too much pressure into the equation. But I think it could be fun! :)

[-] Danterious@lemmy.dbzer0.com 2 points 3 months ago

I had to try scraping the websites multiple times because of stupid bugs I put in the code beforehand, so I might of put more strain on the instances than I meant too. If I did this again it would hopefully be much less tolling on the servers.

As for the cost of scraping it actually isn't that hard I just had it running in the background most of the time.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] finitebanjo@lemmy.world 5 points 3 months ago* (last edited 3 months ago)

So more dots means more activity total for that communities users on any community in the top 35?

Wouldn't a bar graph be sufficient?

[-] Danterious@lemmy.dbzer0.com 11 points 3 months ago* (last edited 3 months ago)

For example most of the red dots to the top right are nsfw communities and it was able to clump like that because the people that comment in those communities tend to comment in the other nsfw communities as well.

edit: left -> right

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] finitebanjo@lemmy.world 8 points 3 months ago

Ah cool, self segregating proximity map. The format makes more sense with this explanation, thank you.

[-] Danterious@lemmy.dbzer0.com 8 points 3 months ago

I didn't measure activity for this map. Each dot represents a community. I only used the communities that were on the top 35 instances (except lemmings.world which it couldn't grab any comments for.)

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

[-] finitebanjo@lemmy.world 3 points 3 months ago

So number of dots is number of communities, but the data relies on comments to those communities to appear in the set?

[-] Danterious@lemmy.dbzer0.com 7 points 3 months ago* (last edited 3 months ago)

Yeah pretty much. I wanted to see communities that had similar people that commented because I thought that would be a good way to see if there were similar kinds of discussions were happening in those communities.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

this post was submitted on 11 Sep 2024
188 points (98.0% liked)

Fediverse

28733 readers
188 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to !moderators@lemmy.world!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS