0
0

Price to rent ratio for every zip code in the US


 invite response                
2012 Feb 1, 5:02am   31,943 views  60 comments

by Patrick   ➕follow (60)   💰tip   ignore  

Just for kicks, I plotted the average price vs average rent for every zip code in the US from my mostly-Craigslist data set of 4.5 million points.

Pretty cool, but I'm not sure what it means except that there is basically a linear relationship.

Please copy this graph and use it where ever you want, as long as you keep the http://patrick.net URL in the image.

Here's the gnuplot commands I used:

set term gif
set output "price_to_rent.gif"
set size 0.8, 0.8
set label 'From http://patrick.net' at graph 0.01, graph 0.08
set xlabel "Average price in zip code"
set ylabel "Average rent in zip code"
set xrange [0:800000]
set yrange [0:3000]
set title "Average price vs average rent by zip code"
plot "price_to_rent.txt" using 2:3 notitle pt 0

Anyone know how to make the text of the url label lighter, or how to put commas and $ marks on the rents and prices?

« First        Comments 4 - 43 of 60       Last »     Search these comments

4   David9   2012 Feb 3, 2:23am  

Looks like 1% and 99% to me, lol (Actually, it does)

5   StoutFiles   2012 Feb 3, 2:39am  

This graph isn't linear. It's y = square_root(x).

6   Patrick   2012 Feb 3, 3:08am  

You mean zip codes from 00000 to 99999? I think it would just be noise, wouldn't it?

OK, I did it. Here:

I guess there are clearly higher prices on the coasts (the very low and very high zip codes).

7   Patrick   2012 Feb 3, 3:12am  

jonstick says

Lots of scatter, but definitely a correlation. It is concentrated on the low end. Could you plot it with a log-log scale?

OK, here:

Damn, when I upload the pictures, the auto-resizing blurs them a bit.

9   Dan8267   2012 Feb 3, 3:22am  


OK, what graph do you want?

Have the dots be hyperlinks to email addresses and Facebook pages.

10   Patrick   2012 Feb 3, 3:23am  

Those low rental numbers happen where there are very few rentals. So maybe some crappy $500/month apartment was the only rental in that otherwise expensive neighborhood.

11   Dan8267   2012 Feb 3, 3:24am  

John Bailo says

How about making the x axis the zipcode number itself.

That would be a meaningless axis.

12   Patrick   2012 Feb 3, 3:24am  

Actually, there is a bit of meaning in that graph, since the zip codes track east to west across the US.

13   bmwman91   2012 Feb 3, 3:25am  

What is your input data comprised of? Is it 3 columns (zip code, avg sales price, avg rent price)?

Also, I have some hefty statistical analysis software at my disposal. If you can package up the data or get me access to the database, I can import it & run some analyses on it over one of my lunch breaks.

14   Dan8267   2012 Feb 3, 3:27am  


Those things happen where there are very few rentals. So maybe some crappy $500/month apartment was the only rental in that otherwise expensive neighborhood.

It would probably be best to compare only houses, not condos and apartments. Ideally, you'd compare a house that's for sale to a similar house that's for rent. Of course, whether or not you can do this depends largely on your data source. Perhaps scrapping ZWillow would be adequate.

15   Dan8267   2012 Feb 3, 3:33am  


Actually, there is a bit of meaning in that graph, since the zip codes track east to west across the US.

It would be far better to graph f(zipcode) where f is a function that returns the mean longitude of the zip code. Otherwise, your trying to infer an axis that isn't well represented by the actual axis.

As you increment the zip code, you can zig-zag quite a bit, particularly if the zip code is a newer one that has been branched off from an older zip code as population increased.

Better yet, make a 3D graph with latitude and longitude as x and y coordinates and the z coordinates is the rent/price ratio. That way you take into account warm vs cold climates.

There is some method to the zip-code assignment, but translating zip-codes to what you want is much better than using the zip code itself.

16   Patrick   2012 Feb 3, 4:12am  

bmwman91 says

What is your input data comprised of? Is it 3 columns (zip code, avg sales price, avg rent price)?

Also, I have some hefty statistical analysis software at my disposal. If you can package up the data or get me access to the database, I can import it & run some analyses on it over one of my lunch breaks.

Sure, that would be fun! Here is the data:

http://patrick.net/contrib/price_to_rent.txt

It's about 400KB in size.

I'm not very good with 3D graphs. Anyone who wants to, please make your own graph and post it here. Thanks!

17   Jimbo in SF   2012 Feb 3, 6:19am  

Slightly different graph, looking at 1st 3 digits of the zip code (which groups by cities or regions).

Avg. Price vs Avg. Rent: r=0.76 r2=58%

18   Peter94087   2012 Feb 3, 7:18am  

Could you add overlay lines representing:
annual rent = price*3%,
annual rent = price*6%,
annual rent = price*9%

I think that would put these values in context.

19   Patrick   2012 Feb 3, 7:18am  

Dunno how to easily do a median in MySQL. Here's what I did to extract from my giant forsale and forrent tables:

create table price_by_zip select avg(price) as price, zip from forsale group by zip;
create table rent_by_zip select avg(rent) as rent, zip from forrent group by zip;
CREATE INDEX zip_index ON price_by_zip (zip);
CREATE INDEX zip_index ON rent_by_zip (zip);
create table price_to_rent select price_by_zip.zip, price, rent from price_by_zip, rent_by_zip where price_by_zip.zip=rent_by_zip.zip;
echo "select * from price_to_rent" | m > price_to_rent.txt
("m" is an alias for command-line mysql)

Wish they had a median function.

20   CBOEtrader   2012 Feb 3, 7:47am  

SFace says

Completely uesless.

Expand your mind a bit. This graph kicks ass.

Sure, a for sale listing is a single family home, and a for rent is an apartment-- but this would basically be similar in each zip code. So, you clearly can't tell what rent to expect out of every 100k in real estate investment.

This graph does do a great job of directing an investor's attention towards a zip code with comparatively higher rents to purchase price though. This data could save a real estate investment trust fund manager a lot of dirty work time.

21   Dan8267   2012 Feb 3, 7:55am  


Sure, that would be fun! Here is the data:

The data source leaves a lot to be desired. It doesn't let you distinguish between types of housing or even house size. There's not too much you can infer from that particular data set. Best to get something more detailed and calculate the averages yourself in SQL.

22   Dan8267   2012 Feb 3, 8:12am  


Dunno how to easily do a median in MySQL.

Off the top of my head and without the benefit of a script IDE, but the following TSQL (mostly SQL, maybe a little specific to MS SQL Server). The semicolons are optional, but I like to use them as they make intent clearer.

declare @Temp table (Row int identity(0, 1), Zip int, Price float, Rent float);

insert into @Temp (Zip, Price, Rent)
select Zip, Price, Rent
from PriceRentByZip
order by Price;

declare @MedianPrice float;

select @MedianPrice = price
from @Temp as Temp
where Temp.Row = count(Temp.Row) / 2;

-------------------------------------------

declare @Temp2 table (Row int identity(0, 1), Zip int, Price float, Rent float);

insert into @Temp2 (Zip, Price, Rent)
select zip, price, rent
from PriceRentByZip
order by Rent;

declare @MedianRent float;

select @MedianRent = price
from @Temp2 as Temp
where Temp.Row = count(Temp.Row) / 2;

There's usually a way to reset the seed value of auto-identity columns, in which case you could just delete form @Temp and wouldn't need @Temp2.

Note, you can select other columns from the row as necessary.

23   Patrick   2012 Feb 3, 8:46am  

CBOEtrader says

Expand your mind a bit. This graph kicks ass.

Thanks! This is just the beginning. I have a list of other graphs I plan to make. Suggestions appreciated too.

Dan8267 says

The data source leaves a lot to be desired. It doesn't let you distinguish between types of housing or even house size. There's not too much you can infer from that particular data set. Best to get something more detailed and calculate the averages yourself in SQL.

I do have more detailed data of course. I just didn't want to give it away after all the work and time it took to collect it. Still trying to figure out a business model for Patrick.net that might actually work.

Dan8267 says

insert into @Temp (Zip, Price, Rent) select Zip, Price, Rent from PriceRentByZip order by Price; declare @MedianPrice float; select @MedianPrice = price from @Temp as Temp where Temp.Row = count(Temp.Row) / 2;

Oy. Why doesn't MySQL just have a median() function the way they have avg()? It would not be hard for them. Maybe the latest MySQL does have it...

24   Dan8267   2012 Feb 3, 12:47pm  


Why doesn't MySQL just have a median() function the way they have avg()?

In order to calculate the median of a set, you have to order the set. In order to calculate the average, you don't.

Technically, you could write a function to calculate the median, but that function would just be sorting the list and picking the middle item. In SQL Server Studio, you could write a CLR function in C# and call that from your TSQL code. There's probably similar functionality in MySQL, but I don't know what it is off the top of my head.

In any case, most DB providers won't give you a function that requires allocating a heck of a lot of memory to do a sort because the caller would expect the function to take little time and memory to execute. By making you do the sort yourself, the DB provider knows you know what your doing.

25   Dan8267   2012 Feb 3, 1:03pm  

I've played around with the data you posted. I decided that the most useful way to visualize the data is using Google Earth. Basically, I combined your data with latitude and longitude and produced the following. Red is a zip code with a high price/rent ratio (buying is bad, renting is good) and green is the opposite (buying good, renting bad).

Of course, the best way to view this data is to load it in Google Earth. To do that just click on one of the files in the following .zip file.

http://patrick.net/uploads/2012/02/price_to_rent_enhanced.zip

Now the data set leaves much to be desired. Since I can't really compare apples to apples with it, the results are only so-so. Of course, with more detailed data, you could put together a far more meaningful and useful graph. So I'll explain what I did in detail next. But first I want to make sure my uploads worked because they were acting a bit funny. I hope they aren't too big. And I'm not sure the .zip file took.

26   Dan8267   2012 Feb 3, 1:08pm  

Looks like there's a file size limit to the image upload function, but I got all the images up now.

The .zip file didn't take. I guess you can only upload images and if the web server can't parse the image and resize it, the web server drops the file.

No matter. I'll just show you how I produced the file.

27   Dan8267   2012 Feb 3, 1:22pm  

So I downloaded your price_to_rent.txt file and a file containing geo-zip info zipcode.csv from http://www.boutell.com/zipcodes/

The format of zipcode.csv is

"zip","city","state","latitude","longitude","timezone","dst"
"00210","Portsmouth","NH","43.005895","-71.013202","-5","1"
"00211","Portsmouth","NH","43.005895","-71.013202","-5","1"
"00606","Maricao","PR","18.182151","-66.9588","-4","0"
"00607","Aguas Buenas","PR","18.256995","-66.104657","-4","0"

"00609","Aibonito","PR","18.142002","-66.273278","-4","0"
"00610","Anasco","PR","18.288319","-67.13604","-4","0"
"00611","Angeles","PR","18.279531","-66.80217","-4","0"

The file has a header, a bunch of useless quotes, and blank lines. The zip codes are padded with leading zeros and it includes zip codes not contained in your data file (mainly non-state zips).

So I wrote a little program to combine your data with this filtering out unwanted stuff. Here's the code in C#.

Code goes here

As you can see, I've modified it a bit and reran it a few times to produce the different .kml files.

So what's a KML file? It's a markup language for adding objects to Google's Earth data model. In this case I'm just adding points of interests, but you can make arbitrary shapes and a whole lot of other things.

So here's an abbreviated chunk of what the KML files look like.

KML goes here

When you double-click a KML file, it loads up in Google Earth, assuming you've installed GE.

Well, that's about it. Let me know if you have any questions. It's fun to look at detailed maps of price:rent ratios, but as I said the data source is a bit misleading because you're comparing small rental condos and large mansions. That's why the outliers are so extreme in the data, and I had to cap the max in the source.

28   Dan8267   2012 Feb 3, 1:29pm  

I couldn't paste the code or KML file as text, so I'm trying images. Evidently some of the characters cause the posting to get rejected.

29   Dan8267   2012 Feb 3, 1:32pm  

Good, the code is readable if you zoom in. Now for the KML snippet.

30   Dan8267   2012 Feb 3, 1:39pm  

OK, you can now get the .zip file containing the KML files at http://www.filedropper.com/pricetorentenhanced

You only need to load one KML file. A created a bunch for convenience since Google Earth doesn't let you turn labels on/off for points of interests and since you may performance issues with the labels on.

31   swebb   2012 Feb 3, 2:08pm  


Fri, 3 Feb 2012 at 12:12 pm Quote Like Flag Permalink Share

bmwman91 says

What is your input data comprised of? Is it 3 columns (zip code, avg sales price, avg rent price)?

Also, I have some hefty statistical analysis software at my disposal. If you can package up the data or get me access to the database, I can import it & run some analyses on it over one of my lunch breaks.

Sure, that would be fun! Here is the data:

http://patrick.net/contrib/price_to_rent.txt

It's about 400KB in size.

Is this the rawest data you have access to, or is this just some particular query that you did?

What I think would be interesting is for you to compute a price/rent ratio for *like* houses. (?Not sure if you have this data or not) In other words get a price to rent ratio for 1 BR condos, 2 BR condos, 3 BR SFH, etc.. You'd end up with data that you could plot (for a given zip code or region) the price to rent ratio on the Y axis, and (say) the number of bedrooms on the X...or the rent (or price) on the X axis...

Again, that takes more data than the 3 columns, so maybe not possible...

32   swebb   2012 Feb 3, 2:13pm  

Dan8267 says

Befriend or Ignore
Friends: 8
Threads: 279
Comments: 1,982
Boca Raton, FL
Fri, 3 Feb 2012 at 9:03 pm Quote Like Flag Permalink Share

I've played around with the data you posted. I decided that the most useful way to visualize the data is using Google Earth.

Wow. That's great. So I have to have GE running on my computer to use the data? (no web-acessible app that can ingest the KML file and then "zoom in" to 80206?)

It's probably time for me to figure out if there is a GE client for OS X.

33   Dan8267   2012 Feb 3, 2:19pm  

swebb says

So I have to have GE running on my computer to use the data?

Technically, no. You can always use data however you want. But if you want to see the data on the world map, yeah, you'll need to install Google Earth. Google doesn't have an interface to view KML files via Google Maps online.

You can get Google Earth from http://www.google.com/earth/download/ge/

And yes, there is a Mac version. When you click download, the web server will automatically give you the version of GE that matches your OS.

Once you install GE, just click on a KML file to see the results launched in GE.

34   B.A.C.A.H.   2012 Feb 3, 3:54pm  


So what should I plot exactly?

Why do you give a rats' ass what someone who self-identifies with an American Express black card thinks?

35   B.A.C.A.H.   2012 Feb 3, 4:00pm  

It is a fascinating plot.

There may be some meaning of the extrapolation to zero.

It looks like your plot extrapolates to average rent of $200 to $300 if the property is worth zero.

But maybe, the rent, which is for the most part related to wages, should be the independent axis, and the average price, which is affected by wages AND other factors, should be the dependent axis. In that case, it looks like zero rent would be correlated to negative $100,000, whatever that might mean, who knows?

36   Dan8267   2012 Feb 4, 12:34am  

Taking a look at the south Florida map, I think there is so much green simply because of the tiny apartments/condos that you can rent. If only I had the sq ft. of each rental, I'd make a calculated column that was the price/sq.ft. or rent/sq.ft. Although not perfect, that would make for a better analysis.

37   Patrick   2012 Feb 4, 1:47am  

Wow Dan, you're a mapping maniac!

Do you know how to do boundaries in Google maps with KML?

I've been wanting to plot all the boundaries of school districts and then make a map you can click on or enter an address to say for any given address what school district it's in. Or just view school district boundaries.

Here's the raw data (census "shapefiles") for non-unified elementary and secondary school districts:

ftp://ftp2.census.gov/geo/tiger/TIGER2010/ELSD/2010/
ftp://ftp2.census.gov/geo/tiger/TIGER2010/SCSD/2010/

And there's the data for unified school districts (which have all school levels together in one district):

ftp://ftp2.census.gov/geo/tiger/TIGER2010/UNSD/2010/

38   Rent4Ever   2012 Feb 4, 2:27am  

This is all very cool.

39   Dan8267   2012 Feb 4, 4:33am  


Do you know how to do boundaries in Google maps with KML?

I would use polygons to make a rectangular fence around the borders. Here's an example using the Pentagon. In the example, I've replaced angle brackets with square brackets so I can paste the text on the forum. Also note that the XML tags are case sensitive even though the XSD is inconsistent in which casing convention it uses.

The Style node is optional, but if you don't include it, you get a solid black polygon. The inner boundaries are also optional, but if you omit them you get a solid shape (no hole). Finally, note that you repeat the first coordinates in order to close the path of the polygon.

The KML reference is available at http://code.google.com/apis/kml/documentation/kmlreference.html.

[?xml version="1.0" encoding="UTF-8"?]
[kml xmlns="http://www.opengis.net/kml/2.2"]
  [Placemark]
    [name]The Pentagon[/name]
    [Style id="s0"]
        [PolyStyle]
            [!-- Color is in the format AABBGGRR, 0 alpha is transparent --]
            [color]80FF0000[/color]
        [/PolyStyle]
    [/Style]
    [Polygon]
      [!-- Extrude 1 means make the shape go to the ground --]
      [extrude]1[/extrude]
      [altitudeMode]relativeToGround[/altitudeMode]
      [!-- Outer boundaries make the pentagon shape --]
      [outerBoundaryIs]
        [LinearRing]
          [!-- Each line is longitude, latitude, altitude --]
          [coordinates]
            -77.05788457660967,38.87253259892824,100
            -77.05465973756702,38.87291016281703,100
            -77.05315536854791,38.87053267794386,100
            -77.05552622493516,38.868757801256,100
            -77.05844056290393,38.86996206506943,100
            -77.05788457660967,38.87253259892824,100
          [/coordinates]
        [/LinearRing]
      [/outerBoundaryIs]
      [!-- Inner boundaries makes the hole in the center --]
      [innerBoundaryIs]
        [LinearRing]
          [coordinates]
            -77.05668055019126,38.87154239798456,100
            -77.05542625960818,38.87167890344077,100
            -77.05485125901024,38.87076535397792,100
            -77.05577677433152,38.87008686581446,100
            -77.05691162017543,38.87054446963351,100
            -77.05668055019126,38.87154239798456,100
          [/coordinates]
        [/LinearRing]
      [/innerBoundaryIs]
    [/Polygon]
  [/Placemark]
[/kml]

Here's the result:

40   swebb   2012 Feb 4, 5:24am  

B.A.C.A.H. says

Why do you give a rats' ass what someone who self-identifies with an American Express black card thinks?

HA! I understand why the AmEx Black card image is a turn off, but that's not enough reason to dismiss what the guy has to say...I take his input and try to ignore the picture...something about babies and bathwater...

41   Wacking Hut   2012 Feb 4, 7:53am  

Price per square foot is how you would normalize across units, if you have that data.

I'm going to set to work on that data set.

42   swebb   2012 Feb 4, 8:32am  

Wacking Hut says

Price per square foot is how you would normalize across units, if you have that data.

That's better than nothing, for sure, but in my world square footage is a secondary issue. Number of bedrooms and type of dwelling (SFH vs duplex vs high-rise) are both more important than square footage...so price to rent normalized by square footage is great if you first make sure you are comparing like units.

It would be helpful to filter based on anomalous square footages, too...really small, vacant lot, etc.

43   Patrick   2012 Feb 4, 8:46am  

Square footage is tough to get for millions of addresses.

Maybe I should ask people to input square footage into my calculator if they have it. That might get them a better rent-vs-buy estimate too.

« First        Comments 4 - 43 of 60       Last »     Search these comments

Please register to comment:

api   best comments   contact   latest images   memes   one year ago   random   suggestions   gaiste