Airbnb Dataset – Microsoft spatial searching

The Inside Airbnb London has the polygons of 33 neighbourhoods and each of the roughly 87900 listings has a latitude and longitude. The WebMinimalAPIDapper Spatial project uses only Microsoft SQL Server’s Spatial functionality for searching and distance calculations. 

Spatial Projections supported by SQL Server

The “magic number” 4326 indicates that the latitude and longitude values are expressed in the World Geodetic System 1984(WGS84) which is also used by the Global Positioning System (GPS) operated by the United States Space Force.

CREATE TABLE [dbo].[Neighbourhood](
	[id] [bigint] IDENTITY(1,1) NOT NULL,
	[NeighbourhoodUID] [uniqueidentifier] NOT NULL,
	[name] [nvarchar](50) NOT NULL,
	[neighbourhood_url] [nvarchar](100) NOT NULL,
	[boundary] [geography] NOT NULL,
 CONSTRAINT [PK_Neighbourhood] PRIMARY KEY CLUSTERED 
(
	[id] ASC
)
-- Then create a spatial index on GEOGRAPHY which contains the boundary polygon(s)
CREATE SPATIAL INDEX [ISX_NeighbourhoodBoundary] ON [dbo].[Neighbourhood]
(
	[boundary]
)

I added a GEOGRAPHY column to the Listing table, populated it using the Latitude and Longitudes of the Listings then added a spatial index.

-- Use latitude and longitude to populate Location GEOGRAPHY column
UPDATE listing
SET Listing.Location = geography::Point(latitude, longitude, 4326)

-- Then index Location column after changing to NOT NULL
CREATE SPATIAL INDEX [IXS_ListingByLocation] ON [dbo].[listing]
(
	[Location]
)

The first spatial search uses the latitude and longitude (most probably extracted from image metadata) to get a Listing’s neighbourhood.

Testing listing in Neighbourhood SQL

It uses the STContains method to find the neighbourhood polygon (if there is one) which the listing location is inside.

const string ListingInNeighbourhoodSQL = @"SELECT neighbourhoodUID, name, neighbourhood_url as neighbourhoodUrl FROM Neighbourhood WHERE Neighbourhood.Boundary.STContains(geography::Point(@Latitude, @Longitude, 4326)) = 1";
...

app.MapGet("/Spatial/Neighbourhood", async (double latitude, double longitude, [FromServices] IDapperContext dapperContext) =>
{
   Model.NeighbourhoodSearchDto neighbourhood;

   using (var connection = dapperContext.ConnectionCreate())
   {
      neighbourhood = await connection.QuerySingleOrDefaultWithRetryAsync<Model.NeighbourhoodSearchDto>(ListingInNeighbourhoodSQL, new { latitude, longitude });
   }

   if (neighbourhood is null)
   {
      return Results.Problem($"Neighbourhood for Latitude:{latitude} Longitude:{longitude} not found", statusCode: StatusCodes.Status404NotFound);
   }

   return Results.Ok(neighbourhood);
})
.Produces<IList<Model.NeighbourhoodSearchDto>>(StatusCodes.Status200OK)
.Produces(StatusCodes.Status404NotFound )
.WithOpenApi();

The case sensitivity of the OGC geography methods tripped me up a few times.

Testing listing Neighbourhood lookup with Swagger user interface

In a future blog post I will compare the performance of STContains vs. STWithin with a load testing application.

Testing listings near a location SQL

The second search simulates a customer looking for listing(s) within a specified distance of a point of interest.

const string ListingsNearbySQL = @"DECLARE @Origin AS GEOGRAPHY = geography::Point(@Latitude, @Longitude, 4326); 
                                  DECLARE @Circle AS GEOGRAPHY = @Origin.STBuffer(@distance); 
                                  --DECLARE @Circle AS GEOGRAPHY = @Origin.BufferWithTolerance(@distance, 0.09,true); 
                                  SELECT uid as ListingUID, Name, listing_url as ListingUrl, 
@Origin.STDistance(Listing.Location) as Distance 
                                  FROM [listing] 
                                  WHERE Listing.Location.STWithin(@Circle) = 1 ORDER BY Distance";
...
app.MapGet("/Spatial/NearbyText", async (double latitude, double longitude, double distance, [FromServices] IDapperContext dapperContext) =>
{
   using (var connection = dapperContext.ConnectionCreate())
   {
      return await connection.QueryWithRetryAsync<Model.ListingNearbyListDto>(ListingsNearbySQL, new { latitude, longitude, distance });
   }
})
.Produces<IList<Model.ListingNearbyListDto>>(StatusCodes.Status200OK)
.WithOpenApi();

The STBuffer command returns a geography object that represents represent a circle centered on @Location with a radius of @distance.

Testing listings near a location with Swagger user interface

The third and final search simulates a customer looking for listing(s) within a specified distance of a point of interest with the latitude and longitude of the listing included in the results.

Testing listings near a location SQL with latitude & Longitude
const string ListingsNearbyLatitudeLongitudeSQL = @"DECLARE @Location AS GEOGRAPHY = geography::Point(@Latitude, @longitude,4326)
                                 DECLARE @Circle AS GEOGRAPHY = @Location.STBuffer(@distance);
                                 SELECT UID as ListingUID
	                              ,[Name]
	                              ,listing_url as ListingUrl
	                              ,Listing.Location.STDistance(@Location) as Distance
	                              ,latitude
                                 ,longitude
                                 FROM [listing]
                                 WHERE Listing.Location.STWithin(@Circle) = 1
                                 ORDER BY Distance";

app.MapGet("/Spatial/NearbyLatitudeLongitude", async (double latitude, double longitude, double distance, [FromServices] IDapperContext dapperContext) =>
{
   using (var connection = dapperContext.ConnectionCreate())
   {
      return await connection.QueryWithRetryAsync<Model.ListingNearbyListLatitudeLongitudeDto>(ListingsNearbyLatitudeLongitudeSQL, new { latitude, longitude, distance });
   }
})
.Produces<IList<Model.ListingNearbyListLatitudeLongitudeDto>>(StatusCodes.Status200OK)
.WithOpenApi();
Testing listings near a location with latitude & Longitude with Swagger user interface

The next couple of posts will use the third-party libraries Geo and NetTopolgySuite

Airbnb Dataset – Neighbourhoods & GEOJSON boundary information

The Inside Airbnb London dataset download has a GeoJSON file with neighbourhood boundaries.

Neighbourhood boundaries rendered with GeoJSON.IO

To load the GeoJSON I used GeoJSON.Text to deserialise the neighbourhood Feature boundaries (polygon & multipolygon).

using System.Text.Json;
using System.Data;
using System.Data.SqlClient;

using Microsoft.SqlServer.Types;

using Dapper;

string jsonString = File.ReadAllText("your geoJSON file");

using (IDbConnection connection = new SqlConnection("This is not the connection string you are looking for"))
{
   connection.Open();

   var neighbourHoods = JsonSerializer.Deserialize<GeoJSON.Text.Feature.FeatureCollection>(jsonString)!;

   Console.WriteLine($"Features:{neighbourHoods.Features.Count}");
   foreach (var feature in neighbourHoods.Features)
   {
      string neighbourhood = feature.Properties["neighbourhood"].ToString();

      Console.WriteLine($"Neightbourhood:{neighbourhood}");

      var geometery = (GeoJSON.Text.Geometry.MultiPolygon)feature.Geometry;

      var s = new SqlGeographyBuilder();

      s.SetSrid(4326);

      s.BeginGeography(OpenGisGeographyType.MultiPolygon);
      s.BeginGeography(OpenGisGeographyType.Polygon); // A
      
      Console.WriteLine($"Polygon cordinates:{geometery.Coordinates.Count}");
      foreach (var coordinates in geometery.Coordinates)
      {
        //s.BeginGeography(OpenGisGeographyType.Polygon); // B

         Console.WriteLine($"Linestring cordinates:{coordinates.Coordinates.Count}");
         foreach (var c in coordinates.Coordinates)
         {
            Console.WriteLine($"Point cordinates:{c.Coordinates.Count}");

            s.BeginFigure(c.Coordinates[0].Latitude, c.Coordinates[0].Longitude, null, null);

            for (int i = 1; i < c.Coordinates.Count; i++)
            {
               s.AddLine(c.Coordinates[i].Latitude, c.Coordinates[i].Longitude);

               Console.Write('.');
            }
            Console.WriteLine();

            s.EndFigure();
         }
         //s.EndGeography(); //B
      }

      s.EndGeography(); //A
      s.EndGeography(); // OpenGisGeographyType.MultiPolygon

      connection.Execute("INSERT INTO Neighbourhood (Name, Boundary) VALUES( @Neighbourhood, geography::STMPolyFromText(@boundary, 4326))", new { neighbourhood, boundary = s.ConstructedGeography.ToString()});

      Console.WriteLine();
   }
}

Console.WriteLine("loaded press <enter> to exit");
Console.ReadLine();

The neighbourhood feature loader utility is pretty “nasty” and was built for my specific scenario

CREATE TABLE [dbo].[Neighbourhood](
	[NeighbourhoodUID] [uniqueidentifier] NOT NULL,
	[Name] [nvarchar](50) NOT NULL,
	[Boundary] [geography] NOT NULL,
 CONSTRAINT [PK_Neighbourhood] PRIMARY KEY CLUSTERED 
(
	[NeighbourhoodUID] ASC
)

I used Dapper to insert rows into the Neighbourhood table

DECLARE @PointGeography AS GEOGRAPHY 

SET @PointGeography = geography::Point(51.512837,-0.2894983, 4326) --Flat
--SET @PointGeography = geography::Point(51.5053469,-0.0262693,4326) -- Canary Wharf
--SET @PointGeography = geography::Point(51.476853,0.0,4326) -- Greenwich
--SET @PointGeography = geography::Point(51.501476,-0.140634, 4326) -- Buckingham palace
--SET @PointGeography = geography::Point(51.533611, -0.318889, 4326) -- Hoover factory
--SET @PointGeography = geography::Point(51.5045, -0.0865, 4326) -- Shard
--SET @PointGeography = geography::Point(51.5145683288574,-0.0194199997931719, 4326) -- D0FD60C0-CC45-4517-91F6-00161E710F28 Tower Hamlets
--SET @PointGeography = geography::Point(51.5553092956543,0.00039999998989515, 4326) -- 80264AED-BC74-4150-B393-02D42711E2E6 Waltham Forest
--SET @PointGeography = geography::Point(51.4925193786621,-0.192310005426407, 4326) -- D36E4D1C-4A35-4B6E-B9A7-01E8D732FD3B Kensington and Chelsea
SET @PointGeography = geography::Point(51.5185317993164,-0.199739992618561, 4326) -- 2CAE3CAE-5E43-4F20-9550-01B86D7EF6FF Westminster

SELECT Name, @PointGeography.STWithin(Neighbourhood.Boundary) as 'STWithin', Neighbourhood.Boundary.STContains(@PointGeography) as 'STContains'
FROM Neighbourhood
ORDER BY Name

To test the neighbourhood geography I built a test harness with some “known” locations 

SQL Server Management Studio Query “inverted” results

The query results were the inverse of what I was expecting.

SQL Server Management Studio Query “reorientated” results

I had forgotten that GeoJSON uses the right-hand rule and Microsoft SQL server uses the left-hand rule for polygons.

UPDATE Neighbourhood SET Boundary = Boundary.ReorientObject()

The neighbourhood dataset is tiny so I used ReorientObject to fix the boundary geography polygons.