« First « Previous Comments 30 - 60 of 60 Search these comments
OK, you can now get the .zip file containing the KML files at http://www.filedropper.com/pricetorentenhanced
You only need to load one KML file. A created a bunch for convenience since Google Earth doesn't let you turn labels on/off for points of interests and since you may performance issues with the labels on.
Fri, 3 Feb 2012 at 12:12 pm Quote Like Flag Permalink Share
bmwman91 says
What is your input data comprised of? Is it 3 columns (zip code, avg sales price, avg rent price)?
Also, I have some hefty statistical analysis software at my disposal. If you can package up the data or get me access to the database, I can import it & run some analyses on it over one of my lunch breaks.
Sure, that would be fun! Here is the data:
http://patrick.net/contrib/price_to_rent.txt
It's about 400KB in size.
Is this the rawest data you have access to, or is this just some particular query that you did?
What I think would be interesting is for you to compute a price/rent ratio for *like* houses. (?Not sure if you have this data or not) In other words get a price to rent ratio for 1 BR condos, 2 BR condos, 3 BR SFH, etc.. You'd end up with data that you could plot (for a given zip code or region) the price to rent ratio on the Y axis, and (say) the number of bedrooms on the X...or the rent (or price) on the X axis...
Again, that takes more data than the 3 columns, so maybe not possible...
Befriend or Ignore
Friends: 8
Threads: 279
Comments: 1,982
Boca Raton, FL
Fri, 3 Feb 2012 at 9:03 pm Quote Like Flag Permalink ShareI've played around with the data you posted. I decided that the most useful way to visualize the data is using Google Earth.
Wow. That's great. So I have to have GE running on my computer to use the data? (no web-acessible app that can ingest the KML file and then "zoom in" to 80206?)
It's probably time for me to figure out if there is a GE client for OS X.
So I have to have GE running on my computer to use the data?
Technically, no. You can always use data however you want. But if you want to see the data on the world map, yeah, you'll need to install Google Earth. Google doesn't have an interface to view KML files via Google Maps online.
You can get Google Earth from http://www.google.com/earth/download/ge/
And yes, there is a Mac version. When you click download, the web server will automatically give you the version of GE that matches your OS.
Once you install GE, just click on a KML file to see the results launched in GE.
It is a fascinating plot.
There may be some meaning of the extrapolation to zero.
It looks like your plot extrapolates to average rent of $200 to $300 if the property is worth zero.
But maybe, the rent, which is for the most part related to wages, should be the independent axis, and the average price, which is affected by wages AND other factors, should be the dependent axis. In that case, it looks like zero rent would be correlated to negative $100,000, whatever that might mean, who knows?
Taking a look at the south Florida map, I think there is so much green simply because of the tiny apartments/condos that you can rent. If only I had the sq ft. of each rental, I'd make a calculated column that was the price/sq.ft. or rent/sq.ft. Although not perfect, that would make for a better analysis.
Wow Dan, you're a mapping maniac!
Do you know how to do boundaries in Google maps with KML?
I've been wanting to plot all the boundaries of school districts and then make a map you can click on or enter an address to say for any given address what school district it's in. Or just view school district boundaries.
Here's the raw data (census "shapefiles") for non-unified elementary and secondary school districts:
ftp://ftp2.census.gov/geo/tiger/TIGER2010/ELSD/2010/
ftp://ftp2.census.gov/geo/tiger/TIGER2010/SCSD/2010/
And there's the data for unified school districts (which have all school levels together in one district):
ftp://ftp2.census.gov/geo/tiger/TIGER2010/UNSD/2010/
Do you know how to do boundaries in Google maps with KML?
I would use polygons to make a rectangular fence around the borders. Here's an example using the Pentagon. In the example, I've replaced angle brackets with square brackets so I can paste the text on the forum. Also note that the XML tags are case sensitive even though the XSD is inconsistent in which casing convention it uses.
The Style node is optional, but if you don't include it, you get a solid black polygon. The inner boundaries are also optional, but if you omit them you get a solid shape (no hole). Finally, note that you repeat the first coordinates in order to close the path of the polygon.
The KML reference is available at http://code.google.com/apis/kml/documentation/kmlreference.html.
[?xml version="1.0" encoding="UTF-8"?] [kml xmlns="http://www.opengis.net/kml/2.2"] [Placemark] [name]The Pentagon[/name] [Style id="s0"] [PolyStyle] [!-- Color is in the format AABBGGRR, 0 alpha is transparent --] [color]80FF0000[/color] [/PolyStyle] [/Style] [Polygon] [!-- Extrude 1 means make the shape go to the ground --] [extrude]1[/extrude] [altitudeMode]relativeToGround[/altitudeMode] [!-- Outer boundaries make the pentagon shape --] [outerBoundaryIs] [LinearRing] [!-- Each line is longitude, latitude, altitude --] [coordinates] -77.05788457660967,38.87253259892824,100 -77.05465973756702,38.87291016281703,100 -77.05315536854791,38.87053267794386,100 -77.05552622493516,38.868757801256,100 -77.05844056290393,38.86996206506943,100 -77.05788457660967,38.87253259892824,100 [/coordinates] [/LinearRing] [/outerBoundaryIs] [!-- Inner boundaries makes the hole in the center --] [innerBoundaryIs] [LinearRing] [coordinates] -77.05668055019126,38.87154239798456,100 -77.05542625960818,38.87167890344077,100 -77.05485125901024,38.87076535397792,100 -77.05577677433152,38.87008686581446,100 -77.05691162017543,38.87054446963351,100 -77.05668055019126,38.87154239798456,100 [/coordinates] [/LinearRing] [/innerBoundaryIs] [/Polygon] [/Placemark] [/kml]
Here's the result:
Why do you give a rats' ass what someone who self-identifies with an American Express black card thinks?
HA! I understand why the AmEx Black card image is a turn off, but that's not enough reason to dismiss what the guy has to say...I take his input and try to ignore the picture...something about babies and bathwater...
Price per square foot is how you would normalize across units, if you have that data.
I'm going to set to work on that data set.
Price per square foot is how you would normalize across units, if you have that data.
That's better than nothing, for sure, but in my world square footage is a secondary issue. Number of bedrooms and type of dwelling (SFH vs duplex vs high-rise) are both more important than square footage...so price to rent normalized by square footage is great if you first make sure you are comparing like units.
It would be helpful to filter based on anomalous square footages, too...really small, vacant lot, etc.
Square footage is tough to get for millions of addresses.
Maybe I should ask people to input square footage into my calculator if they have it. That might get them a better rent-vs-buy estimate too.
Okay you statistics wonks - you do know that Linear regression is not usually a good idea if the data set are heteroscedastic!! Also appears to be a high degree of multicollinearity in the model.....
HA! I understand why the AmEx Black card image is a turn off, but that's not enough reason to dismiss what the guy has to say...I take his input and try to ignore the picture...something about babies and bathwater...
I don't think so.
It's not like SFAce has a birthmark on his face that resembles a Black Card. It is a Choice.
If my kids, when they were babies, flouted their self-identity with Black Card, I'd have thought about throwing them out with their bath water.
This graph isn't linear. It's y = square_root(x).
Somehow most of this thread missed StoutFiles point, which seems like the most interesting point of visualizing this data. What if analysis of this data fits an upside down hockey stick?
I just don't see how that could be.
Rents and prices seem to have a pretty clear linear relationship to each other.
This is great. Combining my favorite things in life; computer geek stuff and the housing crash!
Hey Patrick, want to break the news on the next property bubble? Check out these prices:
http://www.extension.iastate.edu/agdm/wholefarm/html/c2-09.html
Farmland prices are going exponential and decoupling from rents, just like residential real estate in 2004-2005.
I blame Michael Burry.
I just don't see how that could be.
Rents and prices seem to have a pretty clear linear relationship to each other.
I thought so too, but, when it was suggested, could also see a bit of a lower grouping to the right. Use the data analysis tools in Excel and have it draw a best fit curve.
It does look pretty linear. A droop to the right would suggest a thought I've often had, which is that the higher up you go on the sale value of the property, the more you rent for less. For example, going from a studio to a one bedroom isn't twice as much rent. Going from a one bedroom to a two bedroom is less of a rent increase still. Going to a SFH can be a very small increase annually, over renting a two-bedroom apartment in a mult-dwelling building.
OK Patrick, I finally played around with the data a little. It isn't going to reveal anything new, but it is sort of fun to fart around with the data & see what can be determined (or more importantly, what CAN'T be determined) from it.
So, first of all it is a good idea to look at the distribution of the data sets and get a feel for them. At first glance they might look like moderately-skewed normal distributions with some really high outliers. Well, that isn't the case, and they are VERY much log-normally distributed. The top of the plots shows a bunch of stuff (confidence interval of mean, quantile bars, some other stuff we won't worry about), but most important are the clouds of black dots. Those are the outliers...as-calculated for a normally distributed data set. This is not one, so they aren't the actual outliers.
So, I took the log10 of price & rent values and plotted the distribution of those, since they are log-normally distributed. When you do that, suddenly your data set will give a far better resemblance to a normal distribution. This is confirmed by the fact that a normal distribution line is fit to the data and that it fits very well. The goodness of fit part might seem a little confusing or contradictory, and that seems to be an issue in this software (SAS JMP 9.0). Anyway, if the "Prob>D" value is small & has a "*" next to it, then the fit line is very well suited to the data.
Now we can select the outliers and exclude them from the analysis. Since the two data sets do overlap, it makes things a little messy and with the new mean/stdev it looks like there are still outliers. Those are ignored at this point since we don't want to go mixing variables up during sorting.
So, now it is time to plot something that probably look familiar in this thread. Rent vs. Price. Is there a statistically significant link between them?
NO! Here is a brief explanation of what you are looking at.
- The magenta ellipse contains 95% of the data points.
- The blue ellipse contains 50% of the data points. These are just to give a feel for where the points mostly lie since there is this giant cloud.
- The red stuff pertains to a linear fit to the data.
- The green stuff pertains to a log(price) fit to the data.
- The darker, narrow shaded regions by the lines are the 95% confidence-of-fit intervals. These basically say, "For the true population of sales prices and rents, we are confident that the fit line will be within this region 95% of the time. The other 5% of the time, we don't know where it will be."
- The lighter, wide shaded regions are the 95% confidence intervals for the correlation between price and rent. These basically say, "For a given sales price, we are confident that the true population of rent costs lie within this colored band 95% of the time. The other 5% of the time, we don't know what they will be."
Discussion:
Being that the CIs for the price-rent correlation are gigantic, we can basically infer that there is no really solid link between prices and rent nationwide, at least given a sampling of the averages of both. I added the log(price) fit line in there because my intuition tells me that something like that is at play. People are willing to may more per month as "owners" than they are as "renters." We see this all over the Bay Area. Low-tier properties rent for cash flow-positive amounts, while higher-end areas rent for a lot less than it would cost to buy with a mortgage. Then again, that is super location-specific, and perhaps other places don't follow this. According to the limited data set, it varies wildly across the US since no fit line really has any significance here. There just isn't a correlation between price & rent, across the US, if we are using the averages of both as the inputs.
Now, I would like to re-run this using the MEDIAN of both to see if that can tease any correlation out. Averages are flawed for these types of analysis for reasons that we have all read on here before.
If nothing else, just look at the r^2 values. They are pitifully low, which is really all the information anyone needs to know that the fit is really meaningless! There isn't a fit, just a line drawn over a cloud of uncorrelated data!
So, Patrick:
a) Can we get the same data, but with the medians of both?
b) Would it be possible to have an additional column with the 2-letter state code corresponding to the zip code?
c) Do you have some monstrous database with data for every rent listing and recorded sale that you pull this from?
EDIT:
Also, the log-distribution of prices & rents makes perfect sense. People are generally seeking low rents and low house prices, and most market activity falls around these "affordable" levels. There is VERY little available below this since people are seeking the low-end to begin with. Then you see the long tail tapering off into the higher & higher price points. That also makes perfect sense since there are a number of people out there that can afford more expensive stuff. As prices go up, the number of participants goes down, and you see the tapering of prices and rents as they increase. I guess I am a huge nerd, but I love it when data agrees with intuition, and particularly when it seems to explain human behavior.
Also, trying to plot price, rent or price/rent versus zip code isn't useful at all. Zip codes are nominal data, while price and rent are continuous. They can be analysed, and I tossed them into JMP, which treats nominal-X axis + continuous-Y axis data as a One-Way analysis. Nothing useful popped out of that, which is sort of to be expected. It "looks" like there is a bunch of expensive real estate on the west coast, but no statistically significant fits could be applied to the data set to back that up.
I took statistics in German in Munich and didn't really absorb much.
So, Patrick:
a) Can we get the same data, but with the medians of both?
b) Would it be possible to have an additional column with the 2-letter state code corresponding to the zip code?
c) Do you have some monstrous database with data for every rent listing and recorded sale that you pull this from?
I don't know an easy way to get medians from my database.
I could pretty easily put the 2-letter state code because I can just join with my zip code table.
Yes, I have a monstrous database with data for every rent and asking price from Craigslist for about a year.
Then there is still the issue of median rent and median sale is not the same thing.
This is true. Really, it seems like the only proper data set to work with would be one comprised of raw sales prices and rent values. Sampling methodology is a huge factor, and unless EVERY single sale & rental are taken, all sorts of sampling-related issues could mask trends and cause false ones to appear.
I have the x64 version of JMP & 16GB of RAM...I probably COULD work on the entire set of raw data. My gut feeling is that there is going to be some sort of correlation between price and rent without the masking effects of averaging or median filtering. Oh yeah, confirmation bias...about that, lol.
I could pretty easily put the 2-letter state code because I can just join with my zip code table.
Yes, I have a monstrous database with data for every rent and asking price from Craigslist for about a year.
Shoot me an email if you want to see what we can glean from the data. I know that this is sort of Patrick-proprietary stuff since you make a living off the site, so maybe we could make something interesting to share with readers without giving everyone all of your data.
JMP can access databases and stuff, so we wouldn't necessarily need to email monstrous text files.
Another complicating factor is that I don't actually have sales prices, just asking prices. Wish I did.
Asking rents are less of a problem, because the landlord usually gets exactly what he's asking.
Shoot me an email if you want to see what we can glean from the data. I know that this is sort of Patrick-proprietary stuff since you make a living off the site, so maybe we could make something interesting to share with readers without giving everyone all of your data.
OK, I'll mail you.
In my area I would LOVE to have the number of homes, apparts, commer, and land held by the same persons and groups. And how many of the individuals that hold rentals are also members of groups that hold rentals ..... and their financial history (forclosure, BR, ShortSales, ect) .... just a dream, I know.
Personally I think that this sort of information would be very useful. The other graph that I think would be great is rent vs the tenant is likely/not likely to destroy your investment property/not pay the rent graph.
Or
Price vs rental rate as it's no point buying a investment if you can't rent it out.
« First « Previous Comments 30 - 60 of 60 Search these comments
Just for kicks, I plotted the average price vs average rent for every zip code in the US from my mostly-Craigslist data set of 4.5 million points.
Pretty cool, but I'm not sure what it means except that there is basically a linear relationship.
Please copy this graph and use it where ever you want, as long as you keep the http://patrick.net URL in the image.
Here's the gnuplot commands I used:
Anyone know how to make the text of the url label lighter, or how to put commas and $ marks on the rents and prices?