Using Data to Plan Your Next Boston Visit!

Andrew Angeles
Analytics Vidhya
Published in
6 min readJun 23, 2021

--

This blog post is part of the Udacity Data Scientists Nanodegree Program.

source: https://unsplash.com/photos/Ic8B165N1og?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

Have you ever wanted to visit a new city, but were overwhelmed about the planning? How would you get there, what would you do, and most importantly — -where would you stay? Using Airbnb data for Boston listings collected from Kaggle, this post aims to leverage data to help plan your next city adventure!

Primary Goals of the Analysis:

  • How does seasonality impact pricing and inventory within Boston?
  • How does review sentiment relate to features of a property?
  • What features best help predict the price of a listing?

Question 1: “How do seasonality impact pricing and inventory within Boston?”

Aside from weather, and events, a key factor in deciding when to visit a place is both the overall price of accommodations as well as the availability of inventory. To tackle this question, we can leverage data from calendar.csv, and track both the median price of listings by month as well as the count of listings available.

The below chart shows the median price by month and shows that the best pricing can be found in the first 3-months of the year. Though year-by-year, the median list price will likely trend higher — owing to long-run appreciation of real-estate — the seasonal impact on prices will likely persist and January through March should offer the best prices.

Median price used to minimize the impact of outliers

Aside from good pricing, having a suitable inventory of accommodations to choose from also factors into deciding where and when to visit. To help assess the amount of inventory available throughout the year, the below chart shows the daily average of listings available within Boston for each month. The chart shows that between November and February, there is roughly an average of 2,000 units per day available for booking. This suggests that visitors would have the most options for accommodations to choose from between November and February.

The average number of daily listings was used instead of the total count of listings each month to control for the number of days in each month.

Overall from a pricing and inventory perspective, the months of January and February provide the best options for visitors. During these months, users have the most inventory to choose from and the best overall pricing environment. That said, those months are also typically the coldest with an average temperature of roughly 30 degrees Fahrenheit…

Question 2: “Does location impact review sentiment?”

Reviews play an important role when deciding where to eat, what movies to watch, and where to stay. A lot of positive or negative reviews could indicate issues with the unit, host, or general location. Using review data and leveraging python libraries, we assess the average sentiment for each listing and analyze the data to see how neighborhood impacts review sentiment.

There are 25 different neighborhoods within the listings dataset, and from the chart, we can see that the mean review sentiment varies greatly between neighborhoods with Leather Hill having the highest overall reviews while Mission Hill has the lowest score on average. This suggests, all things being equal, booking a listing within the Leather District of Bay Village would provide a better overall experience compared to Fenway and Mission Hill. A quick check of some of the neighborhoods shows that areas with a higher average sentiment score are located closer to downtown and generally have more attractions compared to areas with lower sentiment scores.

Question 3: “What factors impact price?”

Earlier we saw that price is seasonal, as prices moved meaningfully from month to month with the earliest months of the year having the best pricing — while also having the coldest weather. Assuming going in the dead of winter is not on your to-do list, how then, do you then find a deal?

In the below charts we analyze price across property type and room type, and we see some meaningful and intuitive variation.

  • Property Type: With respect to Property Type, it is clear that each type commands a different price, and the results generally align with intuition. For example, all things being equal one would expect a guesthouse to cost more than a dorm room.
  • Room Type: Looking at the room type shows a clear and intuitive pattern. Most people would agree that having the entire unit to oneself is more desirable to only a room, which itself is more desirable than a shared room. The median price trends reflect this.

Pricing and Location

Location Location Location, as the mantra goes. Traditionally, in real estate, location plays a big factor in a unit’s price and the same should hold true for short-term accommodations. To assess the applicability of ‘location location location’ to short term accommodations, the below chart shows the median price across different neighborhoods.

As in the previous analysis of review sentiment and location, the above chart also shows a relationship between listing price and neighborhood. We can see that listings in South Boston Waterfront go for around $250, while listings in Hyde Park command a significantly lower price at around $50. Interestingly we see some overlap in names in the top and bottom third when analyzing review sentiment and pricing by neighborhood. For example, the Leather District has the second-highest overall price and also the best overall reviews.

Predicting Price

Using insights from previous analysis, we will attempt to develop a linear model that aims to predict listing price given a set of listing features. Our linear model will incorporate variables such as neighborhood, room type, and the number of bedrooms and bathrooms.

The resulting model has an R-squared value of 0.656, indicating that we can explain roughly ⅔ of the variation in listing prices! Additionally, the model has an intercept of 30.53, which indicates that as a baseline a random listing in Boston would go for about $31 and the price would then evolve from there based on the characteristics of the listing. Though not perfect, the resulting model can very clearly pick up on features that should impact pricing such as room type and location.

Top 20 Coefficients

For all of the coefficients, a positive value indicates it positively increases the listing price while a negative value indicates it decreases the listing price.

Using the results from the linear regression, a typical 2-bed 1-bath listing for an entire apartment/house in the South Boston Waterfront Neighborhood with an average review sentiment score of 0.5 should go for:

(30.53)+(2x63.89)+(1x31.39)+(109.99)+(0.5x32.9)=$316.14

Generally, any price below that could be considered a deal for a similar unit in September.

Overview

Planning a trip is hard work, luckily data can help! In this post, we saw how we can use data to help inform a trip to Boston. From the data, we saw that the best pricing happens in the early months of the year and that’s also when most units have availability, but that’s also when it's the coldest. That said, May-July offers decent pricing, slightly below average inventory, AND great weather!

Now that we have the when out of the way, we can turn to the where. This is a bit trickier and more subjective because the ‘where’ depends on what type of events and attractions you like BUT overall getting a listing in Leather District, Bay Village, or Longwood Medical Area looks to lead to overall fun times.

The Data

The data used in this analysis was sourced from Kaggle, where three separate CSVs were made available:

source: https://www.kaggle.com/airbnb/boston

  • Calendar.csv: Listings along with their availability and price for each day from 09/2016 to 09/2017 [1,308,890 records, with 3,585 unique listings]
  • Listings.csv: Detailed data such as property type, address, and neighborhood for each listing as of 09/07/2016.[ 3,585 records, with 3,585 unique listings]
  • Reviews.csv: Individual reviews for listings dating from 03/2009 to 09/2016. [68,275 records, with 2,829 unique listings]

Github: https://github.com/angeles890/Udacity_Write_DS_Blog

--

--