Introduction to Elasticsearch in PHP

Share this article

In this tutorial, we’re going to take a look at Elasticsearch and how we can use it in PHP. Elasticsearch is an open-source search server based on Apache Lucene. We can use it to perform super fast full-text and other complex searches. It also includes a REST API which allows us to easily issue requests for creating, deleting, updating and retrieving of data.

ElasticSearch Logo

Installing Elasticsearch

This tutorial will assume you’re using a Debian-based environment like this one in the installation instructions below.

To install Elasticsearch we first need to install Java. By default, it is not available in the repositories that Ubuntu uses so we need to add one.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update

Once that’s done, we can install Java.

sudo apt-get install oracle-java8-installer

Next, let’s download Elasticsearch using wget.

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.5.2.tar.gz

Currently, the most recent stable version is 1.5.2 so that is what we used above. If you want to make sure you get the most recent version, take a look at the Elasticsearch downloads page.

Then, we extract and install.

mkdir es
tar -xf elasticsearch-1.5.2.tar.gz -C es
cd es
./bin/elasticsearch

When we access http://localhost:9200 in the browser, we get something similar to the following:

{
  "status" : 200,
  "name" : "Rumiko Fujikawa",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.5.2",
    "build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
    "build_timestamp" : "2015-04-27T09:21:06Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Using Elasticsearch

Now we can start playing with Elasticsearch. First, let’s install the official Elasticsearch client for PHP.

composer require elasticsearch/elasticsearch

Next, let’s create a new php file that we will use for testing and with the following code so that we can use the Elasticsearch client.

<?php
require 'vendor/autoload.php';

$client = new Elasticsearch\Client();

Indexing Documents

Indexing new documents can be done by calling the index method on the client. This method accepts an array as its argument. The array should contain the body, index and type as its keys. The body is an array containing the data that you want to index. The index is the location where you want to index the specific document (corresponds to database in traditional RDBMS). Lastly, the type is the type you want to give to the document, how you want to categorize the document. It’s like the table in RDBMS land. Here’s an example:

$params = array();
$params['body']  = array(
  'name' => 'Ash Ketchum',
  'age' => 10,
  'badges' => 8 
);

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$result = $client->index($params);

If you print out the $result you get something similar to the following:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => AU1Bn51W5l_vSaLQKPOy
    [_version] => 1
    [created] => 1
)

In the example above, we haven’t specified an ID for the document. Elasticsearch automatically assigns a unique ID if nothing is specified. Let’s try assigning an ID to another document:

$params = array();
$params['body']  = array(
  'name' => 'Brock',
  'age' => 15,
  'badges' => 0 
);

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['id'] = '1A-000';

$result = $client->index($params);

When we print the $result:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 1
    [created] => 1
)

When indexing documents, we’re not limited to a single-dimensional array. We can also index multi-dimensional ones:

$params = array();
$params['body']  = array(
  'name' => 'Misty',
  'age' => 13,
  'badges' => 0,
  'pokemon' => array(
    'psyduck' => array(
      'type' => 'water',
      'moves' => array(
        'Water Gun' => array(
          'pp' => 25,
          'power' => 40
        )
      ) 
    )
  ) 
);

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['id'] = '1A-002';

$result = $client->index($params);

We can go as deep as we want, but we still need to observe proper storage of data (not going too deep, keeping it structured and logical, etc) when we index it with Elasticsearch, just like we do in an RDBMS setting.

Searching for Documents

We can search for existing documents within a specific index using either the get or search method. The main distinction between the two is that the get method is commonly used when you already know the ID of the document. Its also used for getting only a single document. On the other hand, the search() method is used for searching multiple documents, and you can use any field in the document for your query.

Get

First, let’s start with the get method. Just like the index method, this one accepts an array as its argument. The array should contain the index, type and id of the document that you want to find.

$params = array();
$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['id'] = '1A-001';

$result = $client->get($params);

The code above would return the following:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 1
    [found] => 1
    [_source] => Array
        (
            [name] => Brock
            [age] => 15
            [badges] => 0
        )

)

Search with Specific Fields

The array argument for the search method needs to have the index, the type and the body keys. The body is where we specify the query. To start, here’s an example on how we use it to return all the documents which have an age of 15.

$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['body']['query']['match']['age'] = 15;

$result = $client->search($params);

This returns the following:

Array
(
    [took] => 177
    [timed_out] => 
    [_shards] => Array
        (
            [total] => 5
            [successful] => 5
            [failed] => 0
        )

    [hits] => Array
        (
            [total] => 1
            [max_score] => 1
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => pokemon
                            [_type] => pokemon_trainer
                            [_id] => 1A-001
                            [_score] => 1
                            [_source] => Array
                                (
                                    [name] => Brock
                                    [age] => 15
                                    [badges] => 0
                                )

                        )

                )

        )

)

Let’s break the results down:

  • took – number of milliseconds it took for the request to finish.
  • timed_out – returns true if the request timed out.
  • _shards – by default, Elasticsearch distributes the data into 5 shards. If you get 5 as the value for total and successful then every shard is currently healthy. You can find a more detailed explanation in this Stackoverflow thread.
  • hits contains the results.

The method that we used above only allows us to search with a first-level depth, though. If we are to go further down, we have to use bool queries. To do that, we specify bool as an item for the query. Then we can traverse to the field we want by using . starting from the first-level field down to the field we want to use as a query.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['body']['query']['bool']['must'][]['match']['pokemon.psyduck.type'] = 'water';
$result = $client->search($params);

Searching with Arrays

We can search using arrays as the query (to match several values) by specifying the bool item, followed by must, terms and then the field we want to use for the query. We specify an array containing the values that we want to match. In the example below we’re selecting documents which have an age that is equal to 10 and 15.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$params['body']['query']['bool']['must']['terms']['age'] = array(10, 15);

This method only accepts one-dimensional arrays.

Next, let’s do a filtered search. To use filtered search, we have to specify the filtered item and set the range that we want to return for a specific field. In the example below, we’re using the age as the field. We’re selecting documents which have ages greater than or equal to (gte) 11 but less than or equal (lte) to 20.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';
$params['body']['query']['filtered']['filter']['range']['age']['gte'] = 11;
$params['body']['query']['filtered']['filter']['range']['age']['lte'] = 20;
$result = $client->search($params);

OR and AND

In RDBMS land we are used to using the AND and OR keywords to specify two or more conditions. We can also do that with Elasticsearch using filtered search. In the example below we’re using the and filter to select documents which have an age of 10 and a badge count of 8. Only the documents which matched this criteria are returned.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$params['body']['query']['filtered']['filter']['and'][]['term']['age'] = 10;
$params['body']['query']['filtered']['filter']['and'][]['term']['badges'] = 8;

$result = $client->search($params);

If you want to select either of those then you can use or instead.

$params['body']['query']['filtered']['filter']['or'][]['term']['age'] = 10;
$params['body']['query']['filtered']['filter']['or'][]['term']['badges'] = 8;

Limiting Results

Results can be limited to a specific number by specifying the size field. Here’s an example:

$params['body']['query']['filtered']['filter']['and'][]['term']['age'] = 10;
$params['body']['query']['filtered']['filter']['and'][]['term']['badges'] = 8;
$params['size'] = 1;

This returns the first result since we limited the results to just one document.

Pagination

In RDBMS land we have the limit and offset. In Elasticsearch we have size and from. from allows us to specify the index of the first result in the resultset. Documents are zero-indexed. So for 10 results per page, if we have a size of 10, we add 10 to the from value every time the user navigates to the next page.

$params['index'] = 'pokemon';
$params['type']  = 'pokemon_trainer';

$params['size'] = 10;
$params['from'] = 10; // <-- will return second page

Updating a Document

To update a document, we first need to fetch the old data of the document. To do that, we specify the index, type and the id like we did earlier and then we call the get method. The current data can be found in the _source item. All we have to do is update the current fields with new values or add new fields to that item. Finally, we call the update method with the same parameters used for the get method.

$params = array();
$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['id'] = '1A-001';
$result = $client->get($params);


$result['_source']['age'] = 21; //update existing field with new value

//add new field
$result['_source']['pokemon'] = array(
  'Onix' => array(
    'type' => 'rock',
    'moves' => array(
      'Rock Slide' => array(
        'power' => 100,
        'pp' => 40
      ),
      'Earthquake' => array(
        'power' => 200,
        'pp' => 100
      )
    )
  )
);

$params['body']['doc'] = $result['_source'];

$result = $client->update($params);

This returns something similar to the following:

Array
(
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 2
)

Note that the _version is incremented every time you call the update method, regardless of whether things have actually been updated.

You might be wondering why we have a version in the document or even be tempted to think that there’s a functionality in Elasticsearch that allows us to fetch a previous version of a document. Unfortunately, that isn’t so. The version merely serves as a counter as to how many times a document was updated.

Deleting a Document

Deleting a document can be done by calling the delete method. This method accepts an array containing the index, type and id as its argument.

$params = array();
$params['index'] = 'pokemon';
$params['type'] = 'pokemon_trainer';
$params['id'] = '1A-001';

$result = $client->delete($params);

This returns the following:

Array
(
    [found] => 1
    [_index] => pokemon
    [_type] => pokemon_trainer
    [_id] => 1A-001
    [_version] => 7
)

Note that you will get an error if you try to fetch a deleted document using the get method.

Conclusion

In this article, we looked at how we can work with Elasticsearch in PHP using the official Elasticsearch client. Specifically, we’ve taken a look at how to index new documents, search for documents, paginate results, and delete documents.

Overall, Elasticsearch is a nice way to add search functionality to your PHP applications. If you want to learn more about how to integrate Elasticsearch on your PHP applications, you can check out Daniel Sipos’ series on how to integrate Elasticsearch with Drupal and Silex.

If, however, you prefer more automatic solutions to adding in-depth search functionality to your applications, see this series.

Frequently Asked Questions (FAQs) about Elasticsearch in PHP

What is the basic structure of an Elasticsearch query in PHP?

An Elasticsearch query in PHP is structured as an associative array. The array contains key-value pairs that define the index you’re querying, the type of search you’re performing, and the actual search parameters. For instance, a simple match query might look like this:

$params = [
'index' => 'my_index',
'type' => 'my_type',
'body' => [
'query' => [
'match' => [
'testField' => 'abc'
]
]
]
];
$response = $client->search($params);
In this example, ‘my_index’ is the index you’re searching, ‘my_type’ is the type of document you’re looking for, and ‘testField’ => ‘abc’ is the actual search query.

How can I handle errors in Elasticsearch PHP?

Error handling in Elasticsearch PHP can be done using try-catch blocks. When an operation fails, the Elasticsearch client will throw an exception that you can catch and handle. For example:

try {
$response = $client->search($params);
} catch (Elasticsearch\Common\Exceptions\BadRequest400Exception $e) {
// handle exception...
}
In this example, if the search operation fails, an exception of type BadRequest400Exception is thrown. You can catch this exception and handle it as needed.

How can I index a document in Elasticsearch using PHP?

Indexing a document in Elasticsearch using PHP involves creating an array that represents the document and then passing that array to the index() method of the Elasticsearch client. Here’s an example:

$doc = [
'id' => '1',
'title' => 'Elasticsearch: cool. bonsai cool.',
'name' => 'bonsai tree',
'age' => 13,
'lives' => 'Australia',
'about' => 'Bonsai artist'
];
$params = [
'index' => 'my_index',
'type' => 'my_type',
'id' => 'my_id',
'body' => $doc
];
$response = $client->index($params);
In this example, the $doc array represents the document to be indexed. The ‘index’, ‘type’, and ‘id’ keys in the $params array specify the index, type, and ID of the document.

How can I delete a document from Elasticsearch using PHP?

Deleting a document from Elasticsearch using PHP can be done using the delete() method of the Elasticsearch client. You need to specify the index, type, and ID of the document you want to delete. Here’s an example:

$params = [
'index' => 'my_index',
'type' => 'my_type',
'id' => 'my_id'
];
$response = $client->delete($params);
In this example, the ‘index’, ‘type’, and ‘id’ keys in the $params array specify the index, type, and ID of the document to be deleted.

How can I update a document in Elasticsearch using PHP?

Updating a document in Elasticsearch using PHP can be done using the update() method of the Elasticsearch client. You need to specify the index, type, and ID of the document you want to update, as well as the new data for the document. Here’s an example:

$params = [
'index' => 'my_index',
'type' => 'my_type',
'id' => 'my_id',
'body' => [
'doc' => [
'name' => 'new name'
]
]
];
$response = $client->update($params);
In this example, the ‘index’, ‘type’, and ‘id’ keys in the $params array specify the index, type, and ID of the document to be updated. The ‘doc’ key in the ‘body’ array contains the new data for the document.

Wern AnchetaWern Ancheta
View Author

Wern is a web developer from the Philippines. He loves building things for the web and sharing the things he has learned by writing in his blog. When he's not coding or learning something new, he enjoys watching anime and playing video games.

BrunoSElasticsearchfaceted searchfull text searchPHPsearchsearch engine
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week