Decoding Seattle city vibes behind Airbnb’s listing data

Claire Gong
5 min readMay 18, 2020

Background:

In this blog, I will show you some insights from Airbnb’s Seattle listing dataset in 2016. Anyone interested in seeing how it is done can check the codes in my repo.

2016 may sounds a long time ago, but the trend never dies. The listings in Seattle 2016 that have higher prices than average have to share many qualities with their counterparts in New York 2020. The information about seasonal and neighbourhood traffic in Seattle also remains. So, let’s check out the mysteries behind Seattle’s listing data.

What are the top contributors to listing price in Seattle?

There are a lot of features to start with, even after some major chopping off, after data wrangling there are still 60 features. Visualisation is a good way to explore.
For example, below is a heat map with each cell’s colour indicating the degree to which two (numerical) features are correlated. The target is mean_price, so fix your eyes in the last column, and then check the colours bar and the vertical labels.
Corresponding to the boxes with warmer shades in the right-most columns, are features like accommodates, number of rooms and beds, and the cleaning fee.

Below is the second graphs showing correlation between categorical features, you can see how many features we are dealing with. Still from the right-most column, room type and cancellation polices plays some parts.

Relationships between categorical features and price can also be shown in a straightforward way as below:

In conclusion, a listing with below qualities cost more $ in Seattle. Please also note that correlation does not mean causation, but we can still get some information from it.

  1. More bedrooms
  2. More people can be accommodated
  3. Extra cleaning fee
  4. In more expensive neighbourhoods, and Top 3 are: Downtown > Queen Anne > Capitol Hill
  5. More bathrooms (sure, bathrooms comes in order if the house already has a few bedrooms)
  6. Entire apartment for rent
  7. Availability(flexible booking dates)
  8. If the host has more listings in hand(this may indicates the listing is better taken care of or more comfortable)
  9. Higher review score of the listing

The price predictor using machine learning model (Ridge Regression)

machine learning models are used to either predict labelled data(supervised ML) or to group unlabelled data(un-supervised ML); and predicting continuous values such as prices is classified as a regression problem.
The to-go algorithms for regression problems are linear regression models with some type of regularisation to help selecting or downplay the importance of certain features. In my analysis, there are still~60 features left after some major chopping off, I can use Lasso to select features or use Ridge for straightforward regularisation.

Below is the graph of Predicted price vs Actual price using test sub-dataset.

There is an official guide from ScikitLearn on choosing ML models based on the problem, a screenshot applicable for this problem is as below:

What types of listings have higher review scores?

Higher listing review score means that the occupants had enjoyed their stay here, for which the host can ask for a better price, and it is likely to attract more orders in the long term. So the hosts better start taking notes now.

First, it is gathered that listings with below qualities tend to have better review scores:

  1. Host is certified as Airbnb superhost (which means more experience, better service etc. by Airbnb’s standards)
  2. Host is more responsive (higher response rate and less response time)
  3. There are more amenities(fridge, parking, aircon etc, you name it) in the house
  4. Central area
  5. Certain property types(you can get 100 score for listing a YURT! Or for less bungalows will do)
  6. Private room type
  7. Flexible cancellation policy

Below graph also says them all:

In other words, to keep the guests happy, the hosts please try to avoid below:

  1. Response in a few days or more(do you want to rent the house or not?)
  2. University district(bad choice but the students keep coming?)
  3. Certain property types, such as dorm and chalets
  4. Shared room type
  5. Strict cancellation policy

How does the price change over one year?

Combing the calendar and listing data, the monthly trend charts of Seattle’s listing price are drawn below, with each neighbourhood a different colour.

Seen from this plot, in most neighbourhoods price peaks in July, a few in August. Especially in Downtown district price from June to August is much higher than other months in this area. Okay, you may cross those months in your trip planner now.

What are the busiest times of the year to visit Seattle?

Using the reviews dataset, the visiting volume to Seattle using homestay is estimated. Below is a graph showing monthly (estimated) volume of guests visiting Seattle.

Compared to the price, the number of visitors to Seattle changes greatly within a year. From above graph, in most areas of Seattle, August is the most busiest time for visitors. The month of peak price, July, is one month ahead of the busy month. But there is some overlapping around July to August. Logically, peak tourism season is the most expensive also.
However, there is an odd one: Capitol Hill , where May and September are the peak times. Any Seattle locals may tell me what’s so special over there at these times.

Conclusion:

Let’s recap the key points again:
1. Seattle’s Airbnb listing price is mostly determined by the size of the house, fees and neighbourhoods.
2. User’s experience is mostly determined by the hosts’ attitude, and followed by amenities, location and house types.
3. In July, Seattle’s Airbnb listing is most expensive.
4. Seattle is most visited in August, except Capitol Hill, where May and July are the peak season.

--

--