mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
487 lines
16 KiB
Text
487 lines
16 KiB
Text
[#es-connectors-tutorial-api]
|
|
=== Connector API tutorial
|
|
++++
|
|
<titleabbrev>API tutorial</titleabbrev>
|
|
++++
|
|
|
|
Learn how to set up a self-managed connector using the {ref}/connector-apis.html[{es} Connector APIs].
|
|
|
|
For this example we'll use the connectors-postgresql,PostgreSQL connector to sync data from a PostgreSQL database to {es}.
|
|
We'll spin up a simple PostgreSQL instance in Docker with some example data, create a connector, and sync the data to {es}.
|
|
You can follow the same steps to set up a connector for another data source.
|
|
|
|
[TIP]
|
|
====
|
|
This tutorial focuses on running a self-managed connector on your own infrastructure, and managing syncs using the Connector APIs.
|
|
See connectors for an overview of how connectors work.
|
|
|
|
If you're just getting started with {es}, this tutorial might be a bit advanced.
|
|
Refer to {ref}/getting-started.html[quickstart] for a more beginner-friendly introduction to {es}.
|
|
|
|
If you're just getting started with connectors, you might want to start in the UI first.
|
|
We have two tutorials that focus on managing connectors using the UI:
|
|
|
|
* <<es-mongodb-start,Elastic managed connector tutorial>>. Set up a native MongoDB connector, fully managed in Elastic Cloud.
|
|
* <<es-postgresql-connector-client-tutorial,self-managed connector tutorial>>. Set up a self-managed PostgreSQL connector.
|
|
====
|
|
|
|
[discrete#es-connectors-tutorial-api-prerequisites]
|
|
==== Prerequisites
|
|
|
|
* You should be familiar with how connectors, connectors work, to understand how the API calls relate to the overall connector setup.
|
|
* You need to have https://www.docker.com/products/docker-desktop/[Docker Desktop] installed.
|
|
* You need to have {es} running, and an API key to access it.
|
|
Refer to the next section for details, if you don't have an {es} deployment yet.
|
|
|
|
[discrete#es-connectors-tutorial-api-setup-es]
|
|
==== Set up {es}
|
|
|
|
If you already have an {es} deployment on Elastic Cloud (_Hosted deployment_ or _Serverless project_), you're good to go.
|
|
To spin up {es} in local dev mode in Docker for testing purposes, open the collapsible section below.
|
|
|
|
.*Run local {es} in Docker*
|
|
[%collapsible]
|
|
===============
|
|
|
|
[source,sh,subs="attributes+"]
|
|
----
|
|
docker run -p 9200:9200 -d --name elasticsearch \
|
|
-e "discovery.type=single-node" \
|
|
-e "xpack.security.enabled=false" \
|
|
-e "xpack.security.http.ssl.enabled=false" \
|
|
-e "xpack.license.self_generated.type=trial" \
|
|
docker.elastic.co/elasticsearch/elasticsearch:{version}
|
|
----
|
|
|
|
[WARNING]
|
|
====
|
|
This {es} setup is for development purposes only.
|
|
Never use this configuration in production.
|
|
Refer to {ref}/setup.html[Set up {es}] for production-grade installation instructions, including Docker.
|
|
====
|
|
|
|
We will use the default password `changeme` for the `elastic` user. For production environments, always ensure your cluster runs with security enabled.
|
|
|
|
[source,sh]
|
|
----
|
|
export ELASTIC_PASSWORD="changeme"
|
|
----
|
|
|
|
Since we run our cluster locally with security disabled, we won't use API keys to authenticate against the {es}. Instead, in each cURL request, we will use the `-u` flag for authentication.
|
|
|
|
Let's test that we can access {es}:
|
|
|
|
[source,sh]
|
|
----
|
|
curl -s -X GET -u elastic:$ELASTIC_PASSWORD http://localhost:9200
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
Note: With {es} running locally, you will need to pass the username and password to authenticate against {es} in the configuration file for the connector service.
|
|
|
|
===============
|
|
|
|
.Running API calls
|
|
****
|
|
|
|
You can run API calls using the https://www.elastic.co/guide/en/kibana/master/console-kibana.html[Dev Tools Console] in Kibana, using `curl` in your terminal, or with our programming language clients.
|
|
Our example widget allows you to copy code examples in both Dev Tools Console syntax and curl syntax.
|
|
To use curl, you'll need to add authentication headers to your request.
|
|
|
|
Here's an example of how to do that. Note that if you want the connector ID to be auto-generated, use the `POST _connector` endpoint.
|
|
|
|
[source,sh]
|
|
----
|
|
curl -s -X PUT http://localhost:9200/_connector/my-connector-id \
|
|
-H "Authorization: APIKey $APIKEY" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"name": "Music catalog",
|
|
"index_name": "music",
|
|
"service_type": "postgresql"
|
|
}'
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
Refer to connectors-tutorial-api-create-api-key for instructions on creating an API key.
|
|
****
|
|
|
|
[discrete#es-connectors-tutorial-api-setup-postgres]
|
|
==== Run PostgreSQL instance in Docker (optional)
|
|
|
|
For this tutorial, we'll set up a PostgreSQL instance in Docker with some example data.
|
|
Of course, you can *skip this step and use your own existing PostgreSQL instance* if you have one.
|
|
Keep in mind that using a different instance might require adjustments to the connector configuration described in the next steps.
|
|
|
|
.*Expand* to run simple PostgreSQL instance in Docker and import example data
|
|
[%collapsible]
|
|
===============
|
|
|
|
Let's launch a PostgreSQL container with a user and password, exposed at port `5432`:
|
|
|
|
[source,sh]
|
|
----
|
|
docker run --name postgres -e POSTGRES_USER=myuser -e POSTGRES_PASSWORD=mypassword -p 5432:5432 -d postgres
|
|
----
|
|
|
|
*Download and import example data*
|
|
|
|
Next we need to create a directory to store our example dataset for this tutorial.
|
|
In your terminal, run the following command:
|
|
|
|
[source,sh]
|
|
----
|
|
mkdir -p ~/data
|
|
----
|
|
|
|
We will use the https://github.com/lerocha/chinook-database/blob/master/ChinookDatabase/DataSources/Chinook_PostgreSql.sql[Chinook dataset] example data.
|
|
|
|
Run the following command to download the file to the `~/data` directory:
|
|
|
|
[source,sh]
|
|
----
|
|
curl -L https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_PostgreSql.sql -o ~/data/Chinook_PostgreSql.sql
|
|
----
|
|
// NOTCONSOLE
|
|
|
|
Now we need to import the example data into the PostgreSQL container and create the tables.
|
|
|
|
Run the following Docker commands to copy our sample data into the container and execute the `psql` script:
|
|
|
|
[source,sh]
|
|
----
|
|
docker cp ~/data/Chinook_PostgreSql.sql postgres:/
|
|
docker exec -it postgres psql -U myuser -f /Chinook_PostgreSql.sql
|
|
----
|
|
|
|
Let's verify that the tables are created correctly in the `chinook` database:
|
|
|
|
[source,sh]
|
|
----
|
|
docker exec -it postgres psql -U myuser -d chinook -c "\dt"
|
|
----
|
|
|
|
The `album` table should contain *347* entries and the `artist` table should contain *275* entries.
|
|
===============
|
|
|
|
This tutorial uses a very basic setup. To use advanced functionality such as filtering rules and incremental syncs, enable `track_commit_timestamp` on your PostgreSQL database. Refer to postgresql-connector-client-tutorial for more details.
|
|
|
|
Now it's time for the real fun! We'll set up a connector to create a searchable mirror of our PostgreSQL data in {es}.
|
|
|
|
[discrete#es-connectors-tutorial-api-create-connector]
|
|
==== Create a connector
|
|
|
|
We'll use the https://www.elastic.co/guide/en/elasticsearch/reference/master/create-connector-api.html[Create connector API] to create a PostgreSQL connector instance.
|
|
|
|
Run the following API call, using the https://www.elastic.co/guide/en/kibana/master/console-kibana.html[Dev Tools Console] or `curl`:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _connector/my-connector-id
|
|
{
|
|
"name": "Music catalog",
|
|
"index_name": "music",
|
|
"service_type": "postgresql"
|
|
}
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
[TIP]
|
|
====
|
|
`service_type` refers to the third-party data source you're connecting to.
|
|
====
|
|
|
|
Note that we specified the `my-connector-id` ID as a part of the `PUT` request.
|
|
We'll need the connector ID to set up and run the connector service locally.
|
|
|
|
If you'd prefer to use an autogenerated ID, replace `PUT _connector/my-connector-id` with `POST _connector`.
|
|
|
|
[discrete#es-connectors-tutorial-api-deploy-connector]
|
|
==== Run connector service
|
|
|
|
[NOTE]
|
|
====
|
|
The connector service runs automatically in Elastic Cloud, if you're using our managed Elastic managed connectors.
|
|
Because we're running a self-managed connector, we need to spin up this service locally.
|
|
====
|
|
|
|
Now we'll run the connector service so we can start syncing data from our PostgreSQL instance to {es}.
|
|
We'll use the steps outlined in connectors-run-from-docker.
|
|
|
|
When running the connectors service on your own infrastructure, you need to provide a configuration file with the following details:
|
|
|
|
* Your {es} endpoint (`elasticsearch.host`)
|
|
* An {es} API key (`elasticsearch.api_key`)
|
|
* Your third-party data source type (`service_type`)
|
|
* Your connector ID (`connector_id`)
|
|
|
|
[discrete#es-connectors-tutorial-api-create-api-key]
|
|
===== Create an API key
|
|
|
|
If you haven't already created an API key to access {es}, you can use the {ref}/security-api-create-api-key.html[_security/api_key] endpoint.
|
|
|
|
Here, we assume your target {es} index name is `music`. If you use a different index name, adjust the request body accordingly.
|
|
|
|
[source,console]
|
|
----
|
|
POST /_security/api_key
|
|
{
|
|
"name": "music-connector",
|
|
"role_descriptors": {
|
|
"music-connector-role": {
|
|
"cluster": [
|
|
"monitor",
|
|
"manage_connector"
|
|
],
|
|
"indices": [
|
|
{
|
|
"names": [
|
|
"music",
|
|
".search-acl-filter-music",
|
|
".elastic-connectors*"
|
|
],
|
|
"privileges": [
|
|
"all"
|
|
],
|
|
"allow_restricted_indices": false
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
You'll need to use the `encoded` value from the response as the `elasticsearch.api_key` in your configuration file.
|
|
|
|
[TIP]
|
|
====
|
|
You can also create an API key in the {kib} and Serverless UIs.
|
|
====
|
|
|
|
[discrete#es-connectors-tutorial-api-prepare-configuration-file]
|
|
===== Prepare the configuration file
|
|
|
|
Let's create a directory and a `config.yml` file to store the connector configuration:
|
|
|
|
[source,sh]
|
|
----
|
|
mkdir -p ~/connectors-config
|
|
touch ~/connectors-config/config.yml
|
|
----
|
|
|
|
Now, let's add our connector details to the config file.
|
|
Open `config.yml` and paste the following configuration, replacing placeholders with your own values:
|
|
|
|
[source,yaml]
|
|
----
|
|
elasticsearch.host: <ELASTICSEARCH_ENDPOINT> # Your Elasticsearch endpoint
|
|
elasticsearch.api_key: <ELASTICSEARCH_API_KEY> # Your Elasticsearch API key
|
|
|
|
connectors:
|
|
- connector_id: "my-connector-id"
|
|
service_type: "postgresql"
|
|
----
|
|
|
|
We provide an https://raw.githubusercontent.com/elastic/connectors/main/config.yml.example[example configuration file] in the `elastic/connectors` repository for reference.
|
|
|
|
[discrete#es-connectors-tutorial-api-run-connector-service]
|
|
===== Run the connector service
|
|
|
|
Now that we have the configuration file set up, we can run the connector service locally.
|
|
This will point your connector instance at your {es} deployment.
|
|
|
|
Run the following Docker command to start the connector service:
|
|
|
|
[source,sh,subs="attributes+"]
|
|
----
|
|
docker run \
|
|
-v "$HOME/connectors-config:/config" \
|
|
--rm \
|
|
--tty -i \
|
|
--network host \
|
|
docker.elastic.co/enterprise-search/elastic-connectors:{version}.0 \
|
|
/app/bin/elastic-ingest \
|
|
-c /config/config.yml
|
|
----
|
|
|
|
Verify your connector is connected by getting the connector status (should be `needs_configuration`) and `last_seen` field (note that time is reported in UTC).
|
|
The `last_seen` field indicates that the connector successfully connected to {es}.
|
|
|
|
[source, console]
|
|
----
|
|
GET _connector/my-connector-id
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
[discrete#es-connectors-tutorial-api-update-connector-configuration]
|
|
==== Configure connector
|
|
|
|
Now our connector instance is up and running, but it doesn't yet know _where_ to sync data from.
|
|
The final piece of the puzzle is to configure our connector with details about our PostgreSQL instance.
|
|
When setting up a connector in the Elastic Cloud or Serverless UIs, you're prompted to add these details in the user interface.
|
|
|
|
But because this tutorial is all about working with connectors _programmatically_, we'll use the {ref}/update-connector-configuration-api.html[Update connector configuration API] to add our configuration details.
|
|
|
|
[TIP]
|
|
====
|
|
Before configuring the connector, ensure that the configuration schema is registered by the service.
|
|
For Elastic managed connectors, this occurs shortly after creation via the API.
|
|
For self-managed connectors, the schema registers on service startup (once the `config.yml` is populated).
|
|
|
|
Configuration updates via the API are possible only _after schema registration_.
|
|
Verify this by checking the configuration property returned by the `GET _connector/my-connector-id` request.
|
|
It should be non-empty.
|
|
====
|
|
|
|
Run the following API call to configure the connector with our connectors-postgresql-client-configuration,PostgreSQL configuration details:
|
|
|
|
[source, console]
|
|
----
|
|
PUT _connector/my-connector-id/_configuration
|
|
{
|
|
"values": {
|
|
"host": "127.0.0.1",
|
|
"port": 5432,
|
|
"username": "myuser",
|
|
"password": "mypassword",
|
|
"database": "chinook",
|
|
"schema": "public",
|
|
"tables": "album,artist"
|
|
}
|
|
}
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
[NOTE]
|
|
====
|
|
Configuration details are specific to the connector type.
|
|
The keys and values will differ depending on which third-party data source you're connecting to.
|
|
Refer to the individual connectors-references,connector references for these configuration details.
|
|
====
|
|
|
|
[discrete#es-connectors-tutorial-api-sync]
|
|
==== Sync data
|
|
|
|
[NOTE]
|
|
====
|
|
We're using a self-managed connector in this tutorial.
|
|
To use these APIs with an Elastic managed connector, there's some extra setup for API keys.
|
|
Refer to native-connectors-manage-API-keys for details.
|
|
====
|
|
|
|
We're now ready to sync our PostgreSQL data to {es}.
|
|
Run the following API call to start a full sync job:
|
|
|
|
[source, console]
|
|
----
|
|
POST _connector/_sync_job
|
|
{
|
|
"id": "my-connector-id",
|
|
"job_type": "full"
|
|
}
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
To store data in {es}, the connector needs to create an index.
|
|
When we created the connector, we specified the `music` index.
|
|
The connector will create and configure this {es} index before launching the sync job.
|
|
|
|
[TIP]
|
|
====
|
|
In the approach we've used here, the connector will use {ref}/mapping.html#mapping-dynamic[dynamic mappings] to automatically infer the data types of your fields.
|
|
In a real-world scenario you would use the {es} {ref}/indices-create-index.html[Create index API] to first create the index with the desired field mappings and index settings.
|
|
Defining your own mappings upfront gives you more control over how your data is indexed.
|
|
====
|
|
|
|
[discrete#es-connectors-tutorial-api-check-sync-status]
|
|
===== Check sync status
|
|
|
|
Use the {ref}/get-connector-sync-job-api.html[Get sync job API] to track the status and progress of the sync job.
|
|
By default, the most recent job statuses are returned first.
|
|
Run the following API call to check the status of the sync job:
|
|
|
|
[source, console]
|
|
----
|
|
GET _connector/_sync_job?connector_id=my-connector-id&size=1
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
The job document will be updated as the sync progresses, you can check it as often as you'd like to poll for updates.
|
|
|
|
Once the job completes, the status should be `completed` and `indexed_document_count` should be *622*.
|
|
|
|
Verify that data is present in the `music` index with the following API call:
|
|
|
|
[source, console]
|
|
----
|
|
GET music/_count
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
{es} stores data in documents, which are JSON objects.
|
|
List the individual documents with the following API call:
|
|
|
|
[source, console]
|
|
----
|
|
GET music/_search
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
[discrete#es-connectors-tutorial-api-troubleshooting]
|
|
=== Troubleshooting
|
|
|
|
Use the following command to inspect the latest sync job's status:
|
|
|
|
[source, console]
|
|
----
|
|
GET _connector/_sync_job?connector_id=my-connector-id&size=1
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
If the connector encountered any errors during the sync, you'll find these in the `error` field.
|
|
|
|
[discrete#es-connectors-tutorial-api-cleanup]
|
|
==== Cleaning up
|
|
|
|
To delete the connector and its associated sync jobs run this command:
|
|
|
|
[source, console]
|
|
----
|
|
DELETE _connector/my-connector-id&delete_sync_jobs=true
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
This won't delete the Elasticsearch index that was created by the connector to store the data.
|
|
Delete the `music` index by running the following command:
|
|
|
|
[source, console]
|
|
----
|
|
DELETE music
|
|
----
|
|
// TEST[skip:TODO]
|
|
|
|
To remove the PostgreSQL container, run the following commands:
|
|
|
|
[source,sh]
|
|
----
|
|
docker stop postgres
|
|
docker rm postgres
|
|
----
|
|
|
|
To remove the connector service, run the following commands:
|
|
[source,sh]
|
|
----
|
|
docker stop <container_id>
|
|
docker rm <container_id>
|
|
----
|
|
|
|
[discrete#es-connectors-tutorial-api-next-steps]
|
|
==== Next steps
|
|
|
|
Congratulations! You've successfully set up a self-managed connector using the Connector APIs.
|
|
|
|
Here are some next steps to explore:
|
|
|
|
* Learn more about the {ref}/connector-apis.html[Connector APIs].
|
|
* Learn how to deploy {es}, {kib}, and the connectors service using Docker Compose in our https://github.com/elastic/connectors/tree/main/scripts/stack#readme[quickstart guide].
|