The NYT Netflix maps are just the latest in a series of ways geography is being used "interestingly" to make a "point" of some sort that's never actually asserted by the cartographer, but, rather, inferred by the user. This kind of stuff annoys me probably in the same way my using stats annoys scientists (not to say I'm a geographer. I'm not. But I fret about these issues a whole lot). It's facile and leads to self-satisfied conclusions that the data doesn't, actually, support.
Problem 1: The areas are broken up by ZIP code. I think the problems this calls forward are highlighed by the case of Hyde Park, which straddles at least two ZIP codes. You force a kind of "neighborhood" upon the environment that doesn't actually exist there. In some ways this is good, since it randomizes (to a degree) the boundaries. But in other ways it's bad, since 60637 starts to mean something sociologically/anthropologically, not just, you know, postally.
In general, if you're trying to determine something about patterns of some sort, you want to split the study area into quadrats (the number of quadrats determined by comparing the total number of observations to the entire study area). That, of course, requires a certain amount of granularity about the data that netflix might not provide (or the NYT not be interested in working through). But we can't assume that we know something about "a ZIP code" based on the fact that they rented x movie more than y.
(This returns to the one way in which ZIP codes are good, as I hinted above. They are only largely based on preexisting boundaries (of cities, towns, etc.) as opposed to
entirely, like ward boundaries. In that sense, they shake up the possible sample you get in each code. The idea of gerrymandering a ZIP code only makes sense in LA.)
So I don't particularly think that ZIP codes are a revealing means of looking into what's going on. Fun, yes. Which leads me to point 2.
Problem 2: It's irresponsible to throw out data like this and let it sit to be played with, in my opinion, without another variable or something to provide context. What the NYT has provided us with is basically a big toy. As I said to Ben, it's interesting, but only like a crossword puzzle is interesting. I love crossword puzzles, and I love doing them, but I don't post about them or forward links to them, since I can't escape Postman's criticism of crossword puzzles as basically what overeducated and understimulated people do out of intellectual boredom. Is anyone surprised that the South Side of Chicago likes Tyler Perry? So what does it mean to point it out, other than to recycle something people would have already pretty much assumed? Without context, the analysis becomes circular, flattering the viewer into making conclusions he or she already suspected.
Problem 2a: There are no numbers that would help us analyze the data better. As in, we have no idea how many Netflix subscribers are in each ZIP code, either in toto, or as a percentage of population. Furthermore, we don't know how many movies, total, get shipped to each ZIP code. Finally, we have no idea how much space is between #1 and #5 in any ZIP code, yet those present fixed differences in coloring. That one lone ZIP code that really loves Rachel Getting Married? Maybe there's just one household in the entire ZIP with a one-at-a-time plan that has a serious erection for TV on the Radio. Again, this is a lack of context.
Problem 3: Autocorrelation. Basically, this means that similar observations tend to cluster. It's kind of a problem for geography, from my understanding, since one is always trying to figure out how much of the data is tainted by autocorrelation. If the point of the maps is to show that, yes, this shit is hell of spatially correlated, well, big deal. Again, there's nothing new in telling me that shit spatially correlated. I know that it's statistically very likely for adjacent ZIP codes to have similar renting patterns. I would like to know, in seeing this data, what kind of built in issues it has with autocorrelation, etc. Here's where something like Moran's I comes in handy. It tells you whether the dataset is correlated or not, allowing you to then more comfortably make conclusions about the distribution of rental patterns.I clean this up in the commentsAnd then, when there are breaks, like in HP, we're not equipped to understand if that's random or an actual blip, since, again, we're provided with such crappy data. We all *assume* that it's because of the UofC that Slumdoggy was so popular in 60637 (or at least that's what Mario Small suggested in his blog post that alerted me to the site in the first place), but we don't *know* that. And we have no way, with what we're given, to guess how much is the UofC or not.
Which leads, again, to circular and convenient conclusions that flatter our prejudices.
So I've got to go to a booze tasting now, but that's my crank attitude for now.
I'll add one last thing: the eye is a bad mathematician, and it is way too eager to see patterns where there are none, which is why it's so easy to lie with maps. Of course, I understand that this is all fun and a way to burn some time (see Postman and crosswords above), but I've seen the explosion of this kind of mapping shit lately as a threat to real geospatial analysis. I dunno. Forward this on to Conzen and see if he thinks I'm crazy. I'm willing to be told I am. But it doesn't change the fact that, in my work, I have to compete with jokes like the google books map that accompanies every novel.
OKOKK... last thing... What I would've liked is a per movie distribution as a hotzone, so, not bounded by ZIP codes. That would've been more interesting and sociologically useful, since it would help account for the variations in potential renting diversity within each ZIP.