OkCupid 99%ers

I wrote a Python script to do some analysis of my 99% matches on OkCupid, the dating website. It’s might be against their Terms of Use, but I interpret

“…you agree that your use of the Website shall be for bona fide relationship-seeking purposes (for example, you may not use the Website solely to compile a report of compatible singles in your area, or to write a school research paper).”

(emphasis added) as meaning that what I’m doing is probably fine since I actually would like to find someone.

The script is in a repo that can be found here, and requires mechanize, lxml and sort of matplotlib. It also saves what it finds with pickle, to be nice to OkC’s servers, but making use of that data is kind of iffy right now.

Here’s the first image, a simple one.


I’m 21 years old so it’s not surprising that most of my matches are grouped around my own age. All the bar graphs have this shape, a nice bell curve. There are a few outliers here you can see though. The total sample of 99% matches world-wide was like 208 people or something like that. I can’t remember (you can reverse-engineer it from the script) but I think that I searched by match percent, for people with images who had been online in the last month. I filtered by women, but not relationship status (more on that later).

Here’s a bar graph of my friend percents. I never understood what it meant or what it was for, and this wasn’t illuminating at all.

Friend Percents

And here’s the enemy percent, of which I pretty much feel the same way about.

Again, nice bell curves, with the only notable difference being that the enemy values were much lower and more concentrated. Next up is sexual orientation.

Now that I think about it, I’m not sure if the search was supposed to turn up gay women. I assume not, as in a sample this big I’d expect a few to show up. Oh well. It’s interesting that the women who I have high match rates with identify so frequently as bisexual. There’s the issue that a bunch of them may be lying but I can’t easily find out more on that. I also wondered how this compares to the general population, but I couldn’t find bi/pan-sexuality rates with some quick and dirty Google searching and they didn’t have an easy number (even for self-identification) on the OkTrends blog.

It doesn’t turn up in this data, but my high match rate matches tend to be queer or queer friendly, so I guess in this data set it’s more likely than the general population that they’re pansexual and attracted primarily to feminine/masculine and just haven’t come across other people like that. Anyway, gender is tricky!

Two more images. A kinda boring one, then my favorite.

Here’s the reply rates in a nice pie chart, not too interesting to me. Most of my matches are apparently the most selective, followed by very unselective. This data might be more interesting to me if I knew what the general population looked like. Whatever. If someone else has anything interesting to say I’d like to hear it, but let’s get to my favorite pie chart!

(“seeing” means “seeing someone”. It was easier to code this way though.)

I love this one because the result is relatively unexpected (for me anyway). Only just over half of my matches on a dating site are single? Wtf? It makes more sense if you take a second to think about, and know what the profiles look like (which you don’t, but I do). A ton of my matches are polyamorous or in open relationships. What this means for me is tricky, but in terms of just looking at this data, that’s where that comes from, and it makes this pie chart make more sense.

Lastly, I looked at where people were in cities/states (including nations outside the US as states). I tried making a pie chart out of the states, but it was too crowded.


Canada (13, 6.25%)
Ontario (4)
British Columbia (3)
Alberta (2)
Manitoba (2)
Quebec (2)

Washington (15, 7.21%)
Seattle (10)
Mount Vernon

Massachusetts (16, 7.69%)
Boston (5)
Somerville (3)
Chestnut Hill
Jamaica Plain

New York (20, 9.62%)
New York (6)
Brooklyn (4)
Ithaca (2)
Jackson Heights

California (23, 11.06%)
San Francisco (5)
Los Angeles (4)
Berkeley (4)
Los Gatos
Menlo Park
Mill Valley
Anaheim Hills
Newbury Park

What more can I do? The next thing I want to do is make a word cloud (or whatever they’re called). I’d have to scrape the 208 profiles to do this though, and even though I made like 100 requests to compile this data, I’m gonna save that for another day. I expect words like “feminist”, “liberal”, “queer” and the like to show up a lot, but I’m looking forward to finding out new stuff! Also, some nice pie charts on religion and “looking for” will be nice.

Something I tried to do was order the 99%ers (just for curiosity) but that part of the code doesn’t seem to work. I made some assumptions which might not be true, and used my friend’s code without understanding it, since I was busy writing the rest of this. Another possible issue was that the data changed as I was scraping it.

Feedback on this would be cool. Ideas to extend it, what I could have done differently, or even vaguely related stuff would be awesome to hear about.