Geospatial Search with SOLR and Solarium

Share this article

In a recent series of articles I looked in detail at Apache’s SOLR and Solarium.

To recap; SOLR is a search service with a raft of features – such as faceted search and result highlighting – which runs as a web service. Solarium is a PHP library which allows you to integrate with SOLR – whether local or remote – interacting with it as if it were a native component of your application. If you’re unfamiliar with either, then my series is over here, and I’d urge you to take a look.

In this article, I’m going to look at another part of SOLR which warrants its own discussion; Geospatial search.

locationsearch

An Example

I’ve put together a simple example application to accompany this article. You can get it from Github, or see it in action here.

Before we delve into that, let’s look at some of the background.

Sometimes, the things you want to search for have geographical locations. Often, that provides vital context. It’s all very well me being able to search for “Italian restaurants”, but I’m hungry – a restaurant on another continent, as good as it might be, is of no help. Rather, it would be far more useful to be able to run a search which asks “show me Italian restaurants, but within 5 miles”. Or alternatively, “show me the ten closest Italian restaurants”. That’s where Geospatial search comes in.

Geospatial Search and Points

In geospatial applications we often talk about “points”; i.e., a specific geographical location. Specifically, we’re really talking about a latitude and longitude pair. A latitude and longitude defines a point on the globe, potentially to within a few metres.

One of the challenges when you’re developing anything involving geographic points is that you need some way of making sense of them for people who don’t think in latitude and longitude – which I’m pretty sure is most of us. Geolocation comes in handy here, because it can be used to determine the latitude and longitude of “where you are”, without the ambiguities of place names. (If you want to take the latter approach, I’ve written about it before.)

So, the first challenge when you’re doing any sort of geo-related work is to work out how to determine the start point – i.e., where to search from. In our example application we’ll hedge our bets and take three approaches. We’ll use the HTML5 geolocation functionality to allow the user’s browser to locate them. For convenience and simplicity we’ll include an arbitrary list of some major cities, which when clicked will populate the latitude and longitude from some hard-coded values. Finally, just so we have all bases covered, and for the geo-geeks among us, we’ll include text fields in which users can manually enter their latitude and longitude.

Setting up the Schema

In order to get our SOLR core setup to support geographical locations, we need to perform some tweaks to the schema.

The first thing we need to do is to add the location field type to schema.xml:

<fieldType name="location"  class="solr.LatLonType" subFieldSuffix="_coordinate"/>

Note that this field is made up of sub-fields; i.e., a latitude and a longitude. We need to ensure we have a suitable type for those:

<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>

As you can see, it’s basically a field of type double (specifically tdouble, represented internally by the Java class solr.TrieDoubleField).

Both of these <fieldType> declarations need to be placed within the <fields> element of your schema.xml.

Now that the types are set up, you can define a new field to hold the latitude and longitude. In the following example, I’m calling it latlon:

<field name="latlon"        type="location" indexed="true"  stored="true"  multiValued="false" />

It’s important that multiValued is set to false – multiple lat/lon pairs aren’t supported.

You’ll also need to set up a dynamic field for the components; i.e. the latitude and longitude. _coordinate refers to the suffix we specified when we defined our location field type above.

<dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

Both the <field> and <dynamicField> declarations go in the <fields> section.

Your schema is now set up to support latitude / longitude pairs, and we’ve added a field called latlon. Next, let’s look at how to populate that field.

You’ll find an example schema.xml file in the sample application’s repository.

Assigning Location Data

When it comes to assigning a value to a location field, you need to do this:

$doc = {lat},{long}

So, using Solarium:

$doc->latlon = doubleval($latitude) . "," . doubleval($longitude);

Refer to the section “Populating the Data” for a concrete example.

Geospatial Queries in SOLR with Solarium

You might recall that in part three of the SOLR series, we looked at Solarium’s helpers. Basically, these act as syntactic sugar, enabling you to create more complex queries without having to worry too much about the underlying SOLR query syntax.

Here’s an example of how to add an additional filter to a search query, which – given a $latitude and a $longitude – limits the results to within $distance kilometres:

$query->createFilterQuery('distance')->setQuery(
	$helper->geofilt(
		'latlon', 
		doubleval($latitude),
		doubleval($longitude),
		doubleval($distance)
	)
);

If you prefer to work in miles, you simply need to multiply $distance by 1.609344:

$query->createFilterQuery('distance')->setQuery(
	$helper->geofilt(
		'latlon', 
		doubleval($latitude),
		doubleval($longitude),
		doubleval($distance * 1.609344))
	)
);

If you want to return the distance with the search results, you’ll need to add the geodist function to the list of fields, using the same values as the geofilt filter. Again, you can use a helper:

$query->addField($helper->geodist(
	'latlon', 
	doubleval($latitude), 
	doubleval($longitude)
	)
);

It’s far more useful to add a field alias, much like you would in SQL, which you can use to retrieve the value later. The convention with aliases is to prefix and suffix with an underscore, like so:

$query->addField('_distance_:' . $helper->geodist(
	'latlon', 
	doubleval($latitude), 
	doubleval($longitude)
	)
);

Now, you can display the distance in your search results:

<ul>
	<?php foreach ($resultset as $document): ?>
	<li><?php print $doc->title ?> (<?php print round($document->_distance_, 2) ?> kilometres away)</li>
	<?php endforeach; ?>
</ul>

In order to sort the results by distance, you need to apply a little trickery. Rather than use setSort, you actually need to use a query; this is then used to “score” results based on distance. The underlying SOLR query will look like this:

{!func}geodist(fieldname,lat,lng)

To do this with Solarium, again using a helper:

$query->setQuery('{!func}' . $helper->geodist(
	'latlon', 
	doubleval($latitude), 
	doubleval($longitude)
));

The net result of this is that the score will reflect the proximity; the lower the score, the closer it is geographically.

So, to sort the results by distance, closest first:

$query->addSort('score', 'asc');

Enough of the theory; let’s build something.

Building our Example Application

I’ve created a simple example application where people can search for their nearest airports, which you can find on Github, in the solr folder. There’s an online demo here.

It uses Silex as a framework, along with Twig for templating. You shouldn’t need an in-depth knowledge of either in order to follow along, since most of the application’s complexity comes from the SOLR integration, which is covered here.

Populating the Data

The data we’re using is taken from the excellent OpenFlights.org service. You’ll find the data file in the repository, along with a simple script to populate the search index – run it as follows:

php scripts/populate.php

Here’s the relevant section:

// Now let's start importing
while (($row = fgetcsv($fp, 1000, ",")) !== FALSE) {

	// get an update query instance
	$update = $client->createUpdate();

	// Create a document
	$doc = $update->createDocument();    

	$doc->id = $row[0];
	$doc->name = $row[1];
	$doc->city = $row[2];
	$doc->country = $row[3];
	$doc->faa_faa_code = $row[4];
	$doc->icao_code = $row[5];
	$doc->altitude = $row[8];

	$doc->latlon = doubleval($row[6]) . "," . $row[7];

	// Let's simply add and commit straight away.
	$update->addDocument($doc);
	$update->addCommit();

	// this executes the query and returns the result
	$result = $client->update($update);

	$num_imported++;

	// Sleep for a couple of seconds, lest we go too fast for SOLR
	sleep(2);

}

Building the Search Form

We’ll start with a simple form with longitude and latitude fields, as well as a drop-down with which the user can specify the distance to limit to:

<form method="get" action="/">

	<div class="form-group">
		<a href="#/" id="findme" class="btn btn-default"><i class="icon icon-target"></i> Find my location</a>
	</div>

	<div class="form-group">
		<label for="form-lat">Latitude</label>
		<input type="text" name="lat" id="form-lat" class="form-control" />
	</div>

	<div class="form-group">
		<label for="form-lat">Longitude</label>
		<input type="text" name="lng" id="form-lat" class="form-control" />
	</div>

	<div class="form-group">
		<label for="form-dist">Within <em>x</em> kilometers</label>
		<select name="dist" id="form-dist" class="form-control">		            			            	
			<option value="50">50</option>
			<option value="100">100</option>
			<option value="250">250</option>
			<option value="500">500</option>		            			            	
		</select>
	</div>

	<div class="form-group">
		<button type="submit" class="btn btn-primary"><i class="icon icon-search"></i>  Search</button>		          	
	</div>
</form>

Next, let’s implement the “find me” button, which uses HTML5 geolocation – if the user’s browser supports it – to populate the search form.

function success(position) {
	$('input[name="lat"]').val(position.coords.latitude);
	$('input[name="lng"]').val(position.coords.longitude);		  
}

function error(msg) {
	alert(msg);
}

$('#findme').click(function(){
	if (navigator.geolocation) {
		navigator.geolocation.getCurrentPosition(success, error);
	} else {
		error('not supported');
	}
});

Users will need to grant our application permission to locate them, so really it’s best to run this upon some sort of user interaction, such as at the click of a button, rather than on page-load.

Finally, we’ll provide a list of “default” cities; a user can click one to populate the latitude and longitude fields automatically.

Here’s the HTML, showing a limited number of cities for brevity:

<ul id="cities">
	<li><a href="#/" data-lat="52.51670" data-lng="13.33330">Berlin, Germany</a></li>
	<li><a href="#/" data-lat="-34.33320" data-lng="-58.49990">Buenos Aires, Argentina</a></li>

The corresponding JavaScript is extremely simple:

$('#cities a').click(function(e){
	$('input[name="lat"]').val($(this).data('lat'));
	$('input[name="lng"]').val($(this).data('lng'));
});

Next up, we’re going to implement the search.

The Search Page

Let’s start by defining a single route; for the one and only page in our example application. It will display the search form, as well as displaying the results when the latutude and longitude are provided via GET parameters by submitting the form.

// Display the search form / run the search
$app->get('/', function (Request $request) use ($app) {

	$resultset = null;

	$query = $app['solr']->createSelect();
	$helper = $query->getHelper();

	$query->setRows(100);

	$query->addSort('score', 'asc');
	
	if (($request->get('lat')) && ($request->get('lng'))) {
		
		$latitude = $request->get('lat');
		$longitude = $request->get('lng');
		$distance = $request->get('dist');

		$query->createFilterQuery('distance')->setQuery(
				$helper->geofilt(
					'latlon', 
					doubleval($latitude),
					doubleval($longitude),
					doubleval($distance)
				)
			);

		$query->setQuery('{!func}' . $helper->geodist(
			'latlon', 
			doubleval($latitude), 
			doubleval($longitude)
		));

		$query->addField('_distance_:' . $helper->geodist(
			'latlon', 
			doubleval($latitude), 
			doubleval($longitude)
			)
		);

		$resultset = $app['solr']->select($query);

	}
		
	// Render the form / search results
	return $app['twig']->render('index.twig', array(
		'resultset' => $resultset,
	));

});

The boilerplate code is pretty simple stuff – defining the route, grabbing the relevant parameters and rendering the view.

The code which runs the search utilizes the code we looked at earlier. Essentially it does the following:

  1. Creates a filter query, restricting the search to within $distance km of the point specified by $latitude and $longitude; all three are provided as GET parameters
  2. Uses the geodist helper to tell Solarium which field we’re interested in (the latlon field we defined earlier) in order to sort the results
  3. Adds a pseudo-field _distance_ so that we can incorporate it into our search results
  4. Runs the query and assigns its result to the view.

Displaying the Results

Here’s the portion of the template which is responsible for displaying the search results:

{% if resultset %}
	{% for doc in resultset %}
	<article>
		<h4><i class="icon icon-airplane"></i> {{ doc.name }}</h4>
		<p><strong>{{ doc.city }}</strong>, {{ doc.country}} ({{ doc._distance_|number_format }} km away)</p>
	</article>
	<hr />
	{% endfor %}
{% endif %}

It’s pretty straightforward; note how the _distance_ field is available in our search result document, along with the name and country fields. We’re using Twig’s number_format filter to format the distance.

That’s all there is to it – you’ll find the complete example in the repository.

Of course, this example is only searching based on distance. You can of course combine text-based search with geospatial search – I’ll leave that as an exercise.

Summary

In this article I’ve shown how you can use SOLR – in conjunction with the PHP library Solarium – in order to perform geospatial searches. We’ve looked at some of the theory, then dived into setting up our schema, constructing our query and putting it into practice.

Feedback? Comments? Leave them below!

Frequently Asked Questions on Geospatial Search with Solr and Solarium

What is the significance of geospatial search in Solr and Solarium?

Geospatial search is a powerful feature in Solr and Solarium that allows users to perform searches based on geographic locations. It is particularly useful in applications where location data is crucial, such as real estate, travel, and logistics. With geospatial search, you can query for documents within a certain distance from a point, sort documents by distance, and even aggregate documents by geospatial facets.

How does Solr handle geospatial data?

Solr uses a field type called “location” to handle geospatial data. This field type is used to index latitude and longitude coordinates. When a geospatial search query is made, Solr calculates the distance between the indexed location and the location specified in the search query. This allows Solr to return documents that are within a certain distance from the specified location.

How can I perform a geospatial search in Solarium?

In Solarium, you can perform a geospatial search by using the “geofilt” and “bbox” filters. The “geofilt” filter returns documents that fall within a specified radius of a point, while the “bbox” filter returns documents that fall within a bounding box around a point. To use these filters, you need to specify the field name, the point of origin (in latitude and longitude), and the distance.

What is the difference between “geofilt” and “bbox” filters in Solarium?

The “geofilt” and “bbox” filters in Solarium are both used for geospatial search, but they work in slightly different ways. The “geofilt” filter calculates the exact distance from the point of origin to each document, and returns documents that are within a specified radius. On the other hand, the “bbox” filter calculates a bounding box around the point of origin, and returns documents that fall within this box. The “bbox” filter is faster but less accurate than the “geofilt” filter.

How can I sort documents by distance in Solr?

In Solr, you can sort documents by distance using the “geodist” function. This function calculates the distance from a point to each document, and can be used in the “sort” parameter of a search query. For example, to sort documents by distance from a specific location, you would use a query like: sort=geodist() asc.

Can I perform geospatial search on multiple fields in Solr?

Yes, Solr supports geospatial search on multiple fields. You can specify multiple fields in the “sfield” parameter of a geospatial search query. This allows you to search for documents that match a location in any of the specified fields.

How can I improve the performance of geospatial search in Solr?

There are several ways to improve the performance of geospatial search in Solr. One way is to use the “bbox” filter instead of the “geofilt” filter, as it is faster but less accurate. Another way is to use the “RPT” (Recursive Prefix Tree) field type, which is designed for high performance geospatial search.

What is the role of the “SpatialRecursivePrefixTreeFieldType” in Solr?

The “SpatialRecursivePrefixTreeFieldType” in Solr is a field type that is optimized for geospatial search. It uses a spatial index to quickly find documents that are near a specified location. This field type is particularly useful for large datasets, as it can significantly improve the performance of geospatial search queries.

How does Solr handle multi-valued location fields?

Solr can handle multi-valued location fields, which are fields that contain multiple locations. When a geospatial search query is made, Solr calculates the distance from the specified location to each location in the field, and returns the minimum distance. This allows Solr to accurately handle documents that are associated with multiple locations.

Can I use geospatial search with other types of search in Solr?

Yes, you can combine geospatial search with other types of search in Solr. For example, you can use the “fq” parameter to filter the results of a geospatial search query based on other criteria. This allows you to perform complex searches that take into account both location and other factors.

Lukas WhiteLukas White
View Author

Lukas is a freelance web and mobile developer based in Manchester in the North of England. He's been developing in PHP since moving away from those early days in web development of using all manner of tools such as Java Server Pages, classic ASP and XML data islands, along with JavaScript - back when it really was JavaScript and Netscape ruled the roost. When he's not developing websites and mobile applications and complaining that this was all fields, Lukas likes to cook all manner of World foods.

BrunoSgeolocationHTML5 Tutorials & ArticlesOOPHPPHPsilexsolariumsolrtwig
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week