elasticsearch/docs/reference/connector/docs/postgresql-connector-client-tutorial.asciidoc
2024-11-06 22:57:53 +11:00

234 lines
No EOL
9.6 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[#es-postgresql-connector-client-tutorial]
=== PostgreSQL self-managed connector tutorial
++++
<titleabbrev>Tutorial</titleabbrev>
++++
This tutorial walks you through the process of creating a self-managed connector for a PostgreSQL data source.
You'll be using the <<es-build-connector, self-managed connector>> workflow in the Kibana UI.
This means you'll be deploying the connector on your own infrastructure.
Refer to the <<es-connectors-postgresql, Elastic PostgreSQL connector reference>> for more information about this connector.
You'll use the {connectors-python}[connector framework^] to create the connector.
In this exercise, you'll be working in both the terminal (or your IDE) and the Kibana UI.
If you want to deploy a self-managed connector for another data source, use this tutorial as a blueprint.
Refer to the list of available <<es-build-connector,self-managed connectors>>.
[TIP]
====
Want to get started quickly testing a self-managed connector using Docker Compose?
Refer to this https://github.com/elastic/connectors/tree/main/scripts/stack#readme[README] in the `elastic/connectors` repo for more information.
====
[discrete#es-postgresql-connector-client-tutorial-prerequisites]
==== Prerequisites
[discrete#es-postgresql-connector-client-tutorial-prerequisites-elastic]
===== Elastic prerequisites
First, ensure you satisfy the <<es-build-connector-prerequisites, prerequisites>> for self-managed connectors.
[discrete#es-postgresql-connector-client-tutorial-postgresql-prerequisites]
===== PostgreSQL prerequisites
You need:
* PostgreSQL version 11+.
* Tables must be owned by a PostgreSQL user.
* Database `superuser` privileges are required to index all database tables.
[TIP]
====
You should enable recording of the commit time of PostgreSQL transactions.
Otherwise, _all_ data will be indexed in every sync.
By default, `track_commit_timestamp` is `off`.
Enable this by running the following command on the PosgreSQL server command line:
[source,shell]
----
ALTER SYSTEM SET track_commit_timestamp = on;
----
Then restart the PostgreSQL server.
====
[discrete#es-postgresql-connector-client-tutorial-steps]
==== Steps
To complete this tutorial, you'll need to complete the following steps:
. <<es-postgresql-connector-client-tutorial-create-index, Create an Elasticsearch index>>
. <<es-postgresql-connector-client-tutorial-setup-connector, Set up the connector>>
. <<es-postgresql-connector-client-tutorial-run-connector-service, Run the `connectors` connector service>>
. <<es-postgresql-connector-client-tutorial-sync-data-source>>
[discrete#es-postgresql-connector-client-tutorial-create-index]
==== Create an Elasticsearch index
Elastic connectors enable you to create searchable, read-only replicas of your data sources in Elasticsearch.
The first step in setting up your self-managed connector is to create an index.
In the {kibana-ref}[Kibana^] UI, navigate to *Search > Content > Elasticsearch indices* from the main menu, or use the {kibana-ref}/kibana-concepts-analysts.html#_finding_your_apps_and_objects[global search field].
Create a new connector index:
. Under *Select an ingestion method* choose *Connector*.
. Choose *PostgreSQL* from the list of connectors.
. Name your index and optionally change the language analyzer to match the human language of your data source.
(The index name you provide is automatically prefixed with `search-`.)
. Save your changes.
The index is created and ready to configure.
[discrete#es-postgresql-connector-client-tutorial-gather-elastic-details]
.Gather Elastic details
****
Before you can configure the connector, you need to gather some details about your Elastic deployment:
* *Elasticsearch endpoint*.
** If you're an Elastic Cloud user, find your deployments Elasticsearch endpoint in the Cloud UI under *Cloud > Deployments > <your-deployment> > Elasticsearch*.
** If you're running your Elastic deployment and the connector service in Docker, the default Elasticsearch endpoint is `http://host.docker.internal:9200`.
* *API key.*
You'll need this key to configure the connector.
Use an existing key or create a new one.
* *Connector ID*.
Your unique connector ID is automatically generated when you create the connector.
Find this in the Kibana UI.
****
[discrete#es-postgresql-connector-client-tutorial-setup-connector]
==== Set up the connector
Once you've created an index, you can set up the connector.
You will be guided through this process in the UI.
. *Edit the name and description for the connector.*
This will help your team identify the connector.
. *Clone and edit the connector service code.*
For this example, we'll use the {connectors-python}[Python framework^].
Follow these steps:
** Clone or fork that repository locally with the following command: `git clone https://github.com/elastic/connectors`.
** Open the `config.yml` configuration file in your editor of choice.
** Replace the values for `host`, `api_key`, and `connector_id` with the values you gathered <<es-postgresql-connector-client-tutorial-gather-elastic-details,earlier>>.
Use the `service_type` value `postgresql` for this connector.
+
.*Expand* to see an example `config.yml` file
[%collapsible]
====
Replace the values for `host`, `api_key`, and `connector_id` with your own values.
Use the `service_type` value `postgresql` for this connector.
[source,yaml]
----
elasticsearch:
host: <https://<my-elastic-deployment.es.us-west2.gcp.elastic-cloud.com>> # Your Elasticsearch endpoint
api_key: '<YOUR-API-KEY>' # Your top-level Elasticsearch API key
...
connectors:
-
connector_id: "<YOUR-CONNECTOR-ID>"
api_key: "'<YOUR-API-KEY>" # Your scoped connector index API key (optional). If not provided, the top-level API key is used.
service_type: "postgresql"
# Self-managed connector settings
connector_id: '<YOUR-CONNECTOR-ID>' # Your connector ID
service_type: 'postgresql' # The service type for your connector
sources:
# mongodb: connectors.sources.mongo:MongoDataSource
# s3: connectors.sources.s3:S3DataSource
# dir: connectors.sources.directory:DirectoryDataSource
# mysql: connectors.sources.mysql:MySqlDataSource
# network_drive: connectors.sources.network_drive:NASDataSource
# google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource
# azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource
postgresql: connectors.sources.postgresql:PostgreSQLDataSource
# oracle: connectors.sources.oracle:OracleDataSource
# sharepoint: connectors.sources.sharepoint:SharepointDataSource
# mssql: connectors.sources.mssql:MSSQLDataSource
# jira: connectors.sources.jira:JiraDataSource
----
====
[discrete#es-postgresql-connector-client-tutorial-run-connector-service]
==== Run the connector service
Now that you've configured the connector code, you can run the connector service.
In your terminal or IDE:
. `cd` into the root of your `connectors` clone/fork.
. Run the following command: `make run`.
The connector service should now be running.
The UI will let you know that the connector has successfully connected to Elasticsearch.
Here we're working locally.
In production setups, you'll deploy the connector service to your own infrastructure.
If you prefer to use Docker, refer to the {connectors-python}/docs/DOCKER.md[repo docs^] for instructions.
[discrete#es-postgresql-connector-client-tutorial-sync-data-source]
==== Sync your PostgreSQL data source
[discrete#es-postgresql-connector-client-tutorial-sync-data-source-details]
===== Enter your PostgreSQL data source details
Once you've configured the connector, you can use it to index your data source.
You can now enter your PostgreSQL instance details in the Kibana UI.
Enter the following information:
* *Host*.
Server host address for your PostgreSQL instance.
* *Port*.
Port number for your PostgreSQL instance.
* *Username*.
Username of the PostgreSQL account.
* *Password*.
Password for that user.
* *Database*.
Name of the PostgreSQL database.
* *Comma-separated list of tables*.
`*` will fetch data from all tables in the configured database.
Once you've entered all these details, select *Save configuration*.
[discrete#es-postgresql-connector-client-tutorial-sync-data-source-launch-sync]
===== Launch a sync
If you navigate to the *Overview* tab in the Kibana UI, you can see the connector's _ingestion status_.
This should now have changed to *Configured*.
It's time to launch a sync by selecting the *Sync* button.
If you navigate to the terminal window where you're running the connector service, you should see output like the following:
[source,shell]
----
[FMWK][13:22:26][INFO] Fetcher <create: 499 update: 0 |delete: 0>
[FMWK][13:22:26][INF0] Fetcher <create: 599 update: 0 |delete: 0>
[FMWK][13:22:26][INFO] Fetcher <create: 699 update: 0 |delete: 0>
...
[FMWK][23:22:28][INF0] [oRXQwYYBLhXTs-qYpJ9i] Sync done: 3864 indexed, 0 deleted.
(27 seconds)
----
This confirms the connector has fetched records from your PostgreSQL table(s) and transformed them into documents in your Elasticsearch index.
Verify your Elasticsearch documents in the *Documents* tab in the Kibana UI.
If you're happy with the results, set a recurring sync schedule in the *Scheduling* tab.
This will ensure your _searchable_ data in Elasticsearch is always up to date with changes to your PostgreSQL data source.
[discrete#es-postgresql-connector-client-tutorial-learn-more]
==== Learn more
* <<es-build-connector, Overview of self-managed connectors and frameworks>>
* {connectors-python}[Elastic connector framework repository^]
* <<es-connectors-postgresql, Elastic PostgreSQL connector reference>>
* <<es-connectors, Overview of all Elastic connectors>>
* <<es-native-connectors, Elastic managed connectors in Elastic Cloud>>