The Inside Airbnb London dataset is regularly scraped from the Airbnb site a surprising amount of data is duplicated, or incomplete.
CREATE TABLE [dbo].[listingsRaw](
[id] [bigint] NOT NULL,
[listing_url] [nvarchar](50) NOT NULL,
[scrape_id] [datetime2](7) NOT NULL,
[last_scraped] [date] NOT NULL,
[source] [nvarchar](50) NOT NULL,
[name] [nvarchar](max) NOT NULL,
[description] [nvarchar](max) NULL,
[neighborhood_overview] [nvarchar](1050) NULL,
[picture_url] [nvarchar](150) NULL,
[host_id] [int] NOT NULL,
[host_url] [nvarchar](50) NOT NULL,
[host_name] [nvarchar](50) NULL,
[host_since] [date] NULL,
[host_location] [nvarchar](100) NULL,
[host_about] [nvarchar](max) NULL,
[host_response_time] [nvarchar](50) NULL,
[host_response_rate] [nvarchar](50) NULL,
[host_acceptance_rate] [nvarchar](50) NULL,
[host_is_superhost] [bit] NULL,
[host_thumbnail_url] [nvarchar](150) NULL,
[host_picture_url] [nvarchar](150) NULL,
[host_neighbourhood] [nvarchar](50) NULL,
[host_listings_count] [int] NULL,
[host_total_listings_count] [int] NULL,
[host_verifications] [nvarchar](50) NOT NULL,
[host_has_profile_pic] [bit] NULL,
[host_identity_verified] [bit] NULL,
[neighbourhood] [nvarchar](100) NULL,
[neighbourhood_cleansed] [nvarchar](50) NOT NULL,
[neighbourhood_group_cleansed] [nvarchar](1) NULL,
[latitude] [float] NOT NULL,
[longitude] [float] NOT NULL,
[property_type] [nvarchar](50) NOT NULL,
[room_type] [nvarchar](50) NOT NULL,
[accommodates] [tinyint] NOT NULL,
[bathrooms] [nvarchar](1) NULL,
[bathrooms_text] [nvarchar](50) NULL,
[bedrooms] [tinyint] NULL,
[beds] [tinyint] NULL,
[amenities] [nvarchar](max) NOT NULL,
[price] [money] NOT NULL,
[minimum_nights] [smallint] NOT NULL,
[maximum_nights] [int] NOT NULL,
[minimum_minimum_nights] [smallint] NULL,
[maximum_minimum_nights] [int] NULL,
[minimum_maximum_nights] [int] NULL,
[maximum_maximum_nights] [int] NULL,
[minimum_nights_avg_ntm] [float] NULL,
[maximum_nights_avg_ntm] [float] NULL,
[calendar_updated] [nvarchar](1) NULL,
[has_availability] [bit] NOT NULL,
[availability_30] [tinyint] NOT NULL,
[availability_60] [tinyint] NOT NULL,
[availability_90] [tinyint] NOT NULL,
[availability_365] [smallint] NOT NULL,
[calendar_last_scraped] [date] NOT NULL,
[number_of_reviews] [smallint] NOT NULL,
[number_of_reviews_ltm] [int] NOT NULL,
[number_of_reviews_l30d] [tinyint] NOT NULL,
[first_review] [date] NULL,
[last_review] [date] NULL,
[review_scores_rating] [float] NULL,
[review_scores_accuracy] [float] NULL,
[review_scores_cleanliness] [float] NULL,
[review_scores_checkin] [float] NULL,
[review_scores_communication] [float] NULL,
[review_scores_location] [float] NULL,
[review_scores_value] [float] NULL,
[license] [nvarchar](max) NULL,
[instant_bookable] [bit] NOT NULL,
[calculated_host_listings_count] [int] NULL,
[calculated_host_listings_count_entire_homes] [int] NOT NULL,
[calculated_host_listings_count_private_rooms] [int] NOT NULL,
[calculated_host_listings_count_shared_rooms] [int] NOT NULL,
[reviews_per_month] [float] NULL
)
The ListingsRaw table had 87946 rows and there were 53394 hosts so some hosts must have many listings
SELECT COUNT(*) FROM listingsRaw
SELECT COUNT(DISTINCT HOST_ID) FROM listingsRaw
When I queried the TOP(5) hosts
SELECT [Host_id], COUNT(Host_id) as HostCount
FROM listingsRaw
GROUP BY host_id
ORDER BY COUNT(Host_id) DESC, host_id ASC
Host_id HostCount
129230780 595
33889201 312
314162972 301
28820321 288
1432477 259
I was interested in the five hosts with the most listings.
Host_id: 129230780
Name: STK Homes
About: Hello from Staykeepers Homes! We are a group of people who love traveling and sharing their experiences with others. But more so, we like to think of ourselves as people who constantly search to challenge and improve themselves.
URL: https://www.airbnb.com/users/show/129230780
Host_id: 33889201
Name: Veeve
About: Greetings from the Veeve team! Veeve manage the largest collection of homes to rent for short stays in London since 2011, launching in Paris in 2017. Our rental homes are predominantly privately-owned, lived-in homes in vibrant neighbourhoods. Thanks to our fully managed service, when you stay in a Veeve property you can benefit from the spaciousness, personal style and relaxed feel of a home – but with all the amenities of a hotel. Once a booking is confirmed our Customer Service team will become your main point of contact and get in touch with you to organise your arrival at the property (including airport transfers, upon request); after which our on-the-ground management team will look after all aspects of your stay, with in-stay support available to help with any queries. We have strict cleanliness guidelines for our cleaners and all staff who enter our properties and have reinforced existing cleaning procedures in line with official WHO, British and French government guidance. As always, our team ensure that all properties are cleaned to a professional standard before and after each stay, with increased focus on touch-point cleaning (door handles, kitchen surfaces, kettles, taps etc.). All staff who enter properties will follow the below procedures: - Carry and regularly use alcohol-based hand rub - Sanitise their mobile phones - Focus extra attention when cleaning key touch-point areas (door handles, kitchen surfaces, kettles, taps etc.) - Wear protective gear where appropriate - Wash and disinfect work shoes We might be able to arrange for property viewings, upon request, for stays of over 28 nights. Please be aware once a booking is confirmed you will be required to complete a Rental Agreement and also ID verification via our partners Superhog. This is mandatory for all our stays and in accordance to short lets requirements with Veeve. If you have any questions, please do not hesitate to contact us and we will be sure to answer you promptly.
URL: https://www.airbnb.com/users/show/33889201
Host_id: 314162972
Name: Blueground
About: We’re Blueground, a global proptech company with several thousand move-in-ready apartments in a growing number of major cities around the world. With flexible terms and homes in vibrant, centrally based neighborhoods, you’ll feel at home and free to roam for as long as you want — a month, a year, or longer. Each apartment is thoughtfully designed with exclusive furnishings, fully equipped kitchens, and incredible amenities – making every day a five-star experience. From day one, you’ll enjoy high-speed Wi-Fi, premium linens, and smart home entertainment. Plus, access to pools, gyms, and outdoor spaces in select buildings. Why stress over your apartment? We provide a hassle-free alternative — a consistent, quality guest experience that starts even before you arrive. Because we let you book our most up-to-date apartment listings online, confirm with a click, pay securely, and check in easily. During your stay Upon arrival, you’ll either be greeted personally by a Blueground team member or given self-check-in instructions. The entire apartment is yours! You’ll enjoy reliable support via email, phone, and our Guest App, where you can request everything from a home cleaning to extra towels. We’ll share all details upon confirmation of your stay.
URL: https://www.airbnb.com/users/show/314162972
Host_id: 28820321
Name: Veronica
About: I moved to London to follow my love who is a Londoner, I have a 4 years old daughter and two cats. we live in a beautiful house surrounded by parks and trees in Muswell Hill. The Airbnb experience has been very positive and from that experience I set up a company called HelloGuest to provide excellent hassle-free Airbnb management services for short stay rental properties in London and across all the UK advertised on Airbnb. Our services cover everything from full property management (guest communication, check-in & out, cleaning & laundry, manage the listing, 24/7 support service, etc) to partly managed or individual services depending on your needs. Our aim is to provide professional services for high quality properties to deliver an excellent guest experience. We are here to help, so please get in touch with any questions you have.
URL: https://www.airbnb.com/users/show/28820321
host_id: 1432477
Name: City Relay
About: City Relay is a property management company that looks after a bright and diverse portfolio of properties in London! We are committed to offering the best stay to each and every one of our guests. We provide some unique services that most of the other hosts cannot provide. All our properties are selected carefully in order to offer you the best stay! From the moment you book to the moment you arrive at the apartment, we are helping you through all the processes so that your stay is hassle-free! In order to address health and safety concerns, all our properties receive a deep professional cleaning and disinfection after each stay.
URL: https://www.airbnb.com/users/show/1432477
After looking at the listings table this was my first attempt at a Host table.
CREATE TABLE [dbo].[Host](
[Id] [bigint] NOT NULL,
[Url] [nvarchar](150) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Since] [date] NOT NULL,
[About] [nvarchar](max) NULL,
[Response_time] [nvarchar](50) NOT NULL,
[Response_rate] [nvarchar](50) NOT NULL,
[Acceptance_rate] [nvarchar](50) NOT NULL,
[Thumbnail_url] [nvarchar](150) NOT NULL,
[Picture_url] [nvarchar](150) NOT NULL,
[Identity_verified] [bit] NOT NULL,
[Identity_verifications] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Host] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
Which I populated with the following SELECT INTO statement
SELECT DISTINCT [Host_id] as Id
,host_url as url
,[Host_Name] as Name
,host_since as Since
---,host_location as
,host_about as About
,host_response_time as Response_time
,host_response_rate as ResponseRate
,host_acceptance_rate as Acceptance_rate
--,host_is_superhost
,host_thumbnail_url as thumbnail_url
,host_picture_url as picture_url
--,host_neighbourhood
--,host_listings_count
--,host_total_listings_count
,host_verifications as Identity_verifications
--,host_has_profile_pic
,host_identity_verified as Identity_verified
FROM listingsRaw
INTO Host
After extracting the hosts information both needed some more “curation” so I wrote some queries (like the following) to find incomplete records.
SELECT id, Url from Host WHERE Url is null
SELECT id, Name from Host WHERE Name is null
SELECT id, Since from Host WHERE Since is null
SELECT id, response_time from Host WHERE response_time is null
SELECT id, response_rate from Host WHERE response_rate is null
SELECT id, Acceptance_rate from Host WHERE Acceptance_rate is null
SELECT id, Thumbnail_url from Host WHERE Thumbnail_url is null
SELECT id, Picture_url from Host WHERE Picture_url is null
SELECT id, Identity_verified from Host WHERE Identity_verified is null
SELECT id, Identity_verifications from Host WHERE Identity_verifications is null
SELECT id, About from Host WHERE About is null
At the end of my data cleansing the Listing table has 87940 rows and the Host table 53388 rows.













