For the next series of random posts about ASP.NET Core Minimal APIs I wanted a much larger dataset for testing. When looking for datasets for another project I stumbled across Inside Airbnb which has a selection of datasets scraped from the Airbnb website. I started with the Melbourne Victoria Australia dataset which had 24652 listings.
I used the SQL Server Managment Studio (SSMS) “Import flat file” task to load listings.csv.
I had to manually update some of the “suggested” column types which “broke” the import flat file task.
The tinyint or smallint columns which caused conversion errors were changed to an integer column and any text column that which caused a truncation error was changed to an NVARCHAR(MAX) column.
The Melbourne Victoria Australia dataset turned out to be pretty “nasty” with lots of nullable and columns that looked like several values had been concatenated.
CREATE TABLE [dbo].[listings](
[id] [bigint] NOT NULL,
[listing_url] [nvarchar](50) NOT NULL,
[scrape_id] [datetime2](7) NOT NULL,
[last_scraped] [date] NOT NULL,
[source] [nvarchar](50) NOT NULL,
[name] [nvarchar](100) NOT NULL,
[description] [nvarchar](1050) NULL,
[neighborhood_overview] [nvarchar](1050) NULL,
[picture_url] [nvarchar](150) NOT NULL,
[host_id] [int] NOT NULL,
[host_url] [nvarchar](50) NOT NULL,
[host_name] [nvarchar](50) NULL,
[host_since] [date] NULL,
[host_location] [nvarchar](50) NULL,
[host_about] [nvarchar](max) NULL,
[host_response_time] [nvarchar](50) NULL,
[host_response_rate] [nvarchar](50) NULL,
[host_acceptance_rate] [nvarchar](50) NULL,
[host_is_superhost] [bit] NULL,
[host_thumbnail_url] [nvarchar](150) NULL,
[host_picture_url] [nvarchar](150) NULL,
[host_neighbourhood] [nvarchar](50) NULL,
[host_listings_count] [int] NULL,
[host_total_listings_count] [int] NULL,
[host_verifications] [nvarchar](50) NOT NULL,
[host_has_profile_pic] [nvarchar](50) NULL,
[host_identity_verified] [bit] NULL,
[neighbourhood] [nvarchar](max) NULL,
[neighbourhood_cleansed] [nvarchar](50) NOT NULL,
[neighbourhood_group_cleansed] [nvarchar](1) NULL,
[latitude] [float] NOT NULL,
[longitude] [float] NOT NULL,
[property_type] [nvarchar](50) NOT NULL,
[room_type] [nvarchar](50) NOT NULL,
[accommodates] [tinyint] NOT NULL,
[bathrooms] [nvarchar](1) NULL,
[bathrooms_text] [nvarchar](50) NULL,
[bedrooms] [tinyint] NULL,
[beds] [tinyint] NULL,
[amenities] [nvarchar](max) NOT NULL,
[price] [money] NOT NULL,
[minimum_nights] [smallint] NOT NULL,
[maximum_nights] [int] NOT NULL,
[minimum_minimum_nights] [smallint] NOT NULL,
[maximum_minimum_nights] [int] NOT NULL,
[minimum_maximum_nights] [int] NOT NULL,
[maximum_maximum_nights] [int] NOT NULL,
[minimum_nights_avg_ntm] [float] NOT NULL,
[maximum_nights_avg_ntm] [float] NOT NULL,
[calendar_updated] [nvarchar](1) NULL,
[has_availability] [bit] NOT NUL\L,
[availability_30] [tinyint] NOT NULL,
[availability_60] [tinyint] NOT NULL,
[availability_90] [tinyint] NOT NULL,
[availability_365] [smallint] NOT NULL,
[calendar_last_scraped] [date] NOT NULL,
[number_of_reviews] [int] NOT NULL,
[number_of_reviews_ltm] [int] NOT NULL,
[number_of_reviews_l30d] [tinyint] NOT NULL,
[first_review] [date] NULL,
[last_review] [date] NULL,
[review_scores_rating] [float] NULL,
[review_scores_accuracy] [float] NULL,
[review_scores_cleanliness] [float] NULL,
[review_scores_checkin] [float] NULL,
[review_scores_communication] [float] NULL,
[review_scores_location] [float] NULL,
[review_scores_value] [float] NULL,
[license] [nvarchar](1) NULL,
[instant_bookable] [bit] NOT NULL,
[calculated_host_listings_count] [int] NOT NULL,
[calculated_host_listings_count_entire_homes] [tinyint] NOT NULL,
[calculated_host_listings_count_private_rooms] [tinyint] NOT NULL,
[calculated_host_listings_count_shared_rooms] [bit] NULL,
[reviews_per_month] [float] NULL,
CONSTRAINT [PK_listings] PRIMARY KEY CLUSTERED
(
[id] ASC
)
On the Kaggle website there was a study of an Airbnb dataset from Singapore which looked better but was a significantly smaller with only 3483 listings. After some “exploration” of the available datasets on Inside Airbnb the London one looked pretty good.
The “suggested” table structure for the London dataset looks a lot better so I will use if for my ASP.NET Core Minimal APIs posts
CREATE TABLE [dbo].[listings](
[id] [bigint] NOT NULL,
[name] [nvarchar](max) NOT NULL,
[host_id] [int] NOT NULL,
[host_name] [nvarchar](50) NULL,
[neighbourhood_group] [nvarchar](1) NULL,
[neighbourhood] [nvarchar](50) NOT NULL,
[latitude] [float] NOT NULL,
[longitude] [float] NOT NULL,
[room_type] [nvarchar](50) NOT NULL,
[price] [money] NOT NULL,
[minimum_nights] [smallint] NOT NULL,
[number_of_reviews] [int] NOT NULL,
[last_review] [date] NULL,
[reviews_per_month] [float] NULL,
[calculated_host_listings_count] [int] NOT NULL,
[availability_365] [smallint] NOT NULL,
[number_of_reviews_ltm] [int] NOT NULL,
[license] [nvarchar](50) NULL,
CONSTRAINT [PK_listings] PRIMARY KEY CLUSTERED
(
[id] ASC
)
Some of the columns will need to be “unpacked” e.g. “Rental unit in Islington · ★4.80 · 1 bedroom · 1 bed · 1 shared bath” but this shouldn’t be an issue.




