Perform Simple EDA on Airbnb Open Dataset

Sometimes even the things we see with our naked eyes are not the “naked” truth. It needs time, conviction and certainty to get behind the truth. EDA — Exploratory Data Analysis - does this for any Data Science & Machine Learning enthusiast like us! EDA is one of the crucial steps in data science that allows us to achieve certain insights and statistical measures that are essential for business continuity, stockholders and data scientists. It performs to define and refine the important features of variable selection that will be used in our model.

Exploratory Data Analysis provides a lot of critical information which is very easy to miss – information that helps the analysis in the long run, from framing questions to displaying results. In this blog, we will be performing a simple Exploratory Data Analysis in python on the popular Airbnb Open Dataset, which has information regarding the activity of Airbnb in Seattle, USA. This will include complete Airbnb data analytics from simple to intermediate level!

Before diving into our EDA, let’s talk about the data first. Seattle is the 5th most expensive city in the United States based on the cost of living. I am pretty sure that a trip to Seattle would cost quite a bit for the travellers. With the rising demand for more personalized stays and growing competition in the tourism and hotel industry, could Seattle Airbnb be the best choice? Let’s find out some simple yet interesting insights with this amazing dataset! You can find the link to the dataset here.

Assuming you have downloaded the data, let’s understand what the data has for us!

  • Listings- contains all the information about 3818 Airbnb listings in 92 columns. The listings are identified using the ‘id’.
  • Calendar- This is a booking diary for the year Jan’16 - Jan’17 having 1,393,570 rows and 4 columns. containing 365 records for each Airbnb listing. Every listing id has 365 days of records, with its availability and booking price for the day. The booking price is not mentioned if the property is already booked.
  • Reviews- Contains reviews about the listings from unique reviewers. I will share my insights on this data in another post.

I will explain the insights which I extracted here, covering data analysis in python, and at the end of the blog, you will be provided with the GitHub Link which will include the whole code! Coming towards our first visual as a part of our EDA, let’s see what the average daily booking price for a couple staying in Seattle Airbnb is. The average hotel price in Seattle for a couple is $189. Do you think Airbnb could provide a better deal? To find the answers, I have used the Accommodates feature to plot against the booking price.

From the above bar graph, you can see that one night-stay in Airbnb Seattle would cost close to $91 per person and $112 a couple which is more convenient than the average hotel price in Seattle. The little orange lines are variations in the price. Looks like Airbnb gives us a better deal!

Now, let’s have a look at How much do Seattle Airbnb homes are earning in certain time frames of the year? This can be found out by looking at the monthly and weekly booking trends throughout the year.

There is an upward trend in price from Feb’16 to July’16 where the average price rises to the highest which is $153 per day. And then, a downward trend till Nov’16. From the graph, the best time to visit Seattle is from June to August. Summer marks the city’s high season, meaning room rates rise and availability drops, while cold winter weather can deter even the most avid sightseers. A little line in Jan means that there are 2 prices for 2016 and 2017: The average price in Jan’17 is higher than Jan’16.

Now, let’s find out if there's a spike in booking prices during the weekends. Usually, people like to do weekend getaways to get some time off from stressful work. Let’s find out if the same is applicable to Airbnb homestays.

There is a periodic small peak in the graph which could be an indicator of seasonality on weekends. I have further plotted the price by weekday to go one level deeper.

The above boxplot shows a peak in the booking price on Friday and Saturday. Clearly, there is a weekly trend where the listing prices on the weekends are higher than on other weekdays.

Coming towards the next series of amazing visuals, we’ll be looking at which are the busiest/most expensive neighbourhoods in Seattle. This will include a series of visuals where we’ll be seeing the Top 10 busiest neighbourhoods in Seattle, followed by the Top 10 most expensive neighbourhoods in Seattle.

Coming towards the Top 10 busiest neighbourhoods in Seattle, the neighbourhoods which are a hub of Airbnb homestays would be the busiest ones with a lot of tourist activities. I have used the neighbourhood group(district) and neighbourhood column to find the busiest neighbourhood.

Broadway, Capitol Hill seems to be the most crowded neighbourhood in Seattle with 397 Airbnb properties followed by Belltown, Downtown with 234 properties. And, moving forward to the Top 10 most expensive neighbourhoods in Seattle!

Southeast Magnolia is the most expensive place to stay in Seattle with an average price per night of $232 followed by Portage Bay in Capitol Hill with an average price of close to $227. If you want to experience a posh homestay, book your stay in these neighbourhoods where the average price a night would range from $170 to $232.

Coming towards the next and the final section, we will be looking at which types of houses are used by Airbnb or what are the most common types of Airbnb properties in Seattle.

Houses and Apartments occupy 90% of all the Airbnb properties in Seattle with 1733 Houses followed by 1708 Apartments. Yurt, Dorm and Chalet are available to Nil in Seattle. And as you can see from the above plot, most of the properties have preferred room types as House/Apt and Private in contrast to shared rooms. Here is the link to the detailed code for these visualizations:

Though there are numerous ways to find out what factors affect the Airbnb booking price, it totally depends on travellers’ purpose of visit. I am pretty sure the above insights will give a big picture of Airbnb homestays in Seattle. Well, these were the simple EDA that can be extracted from this dataset, yet there are hundreds of visuals that can be extracted!

We at Board Infinity wish and recommend experimenting more and more with the data so that you excel in your skills as a beginner in Data science. Thank you for giving your precious time, and stay tuned for more amazing blogs with us!