elasticsearch/docs/reference/connector/docs/connectors-mongodb.asciidoc
2024-09-30 19:01:25 +10:00

777 lines
26 KiB
Text

[#es-connectors-mongodb]
=== Elastic MongoDB connector reference
++++
<titleabbrev>MongoDB</titleabbrev>
++++
// Attributes used in this file
:service-name: MongoDB
:service-name-stub: mongodb
The _Elastic MongoDB connector_ is a <<es-connectors,connector>> for https://www.mongodb.com[MongoDB^] data sources.
This connector is written in Python using the {connectors-python}[Elastic connector framework^].
View the {connectors-python}/connectors/sources/{service-name-stub}.py[*source code* for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
.Choose your connector reference
*******************************
Are you using a managed connector on Elastic Cloud or a self-managed connector? Expand the documentation based on your deployment method.
*******************************
// //////// //// //// //// //// //// //// ////////
// //////// NATIVE CONNECTOR REFERENCE ///////
// //////// //// //// //// //// //// //// ////////
[discrete#es-connectors-mongodb-native-connector-reference]
==== *Elastic managed connector reference*
.View *Elastic managed connector* reference
[%collapsible]
===============
[discrete#es-connectors-mongodb-prerequisites]
===== Availability and prerequisites
This connector is available as a *managed connector* in Elastic versions *8.5.0 and later*.
To use this connector natively in Elastic Cloud, satisfy all <<es-native-connectors-prerequisites,managed connector requirements>>.
[discrete#es-connectors-mongodb-compatibility]
===== Compatibility
This connector is compatible with *MongoDB Atlas* and *MongoDB 3.6 and later*.
The data source and your Elastic deployment must be able to communicate with each other over a network.
[discrete#es-connectors-mongodb-configuration]
===== Configuration
Each time you create an index to be managed by this connector, you will create a new connector configuration.
You will need some or all of the following information about the data source.
Server hostname::
The URI of the MongoDB host.
Examples:
+
* `mongodb+srv://my_username:my_password@cluster0.mongodb.net/mydb?w=majority`
* `mongodb://127.0.0.1:27017`
Username::
The MongoDB username the connector will use.
+
The user must have access to the configured database and collection.
You may want to create a dedicated, read-only user for each connector.
Password::
The MongoDB password the connector will use.
Database::
The MongoDB database to sync.
The database must be accessible using the configured username and password.
Collection::
The MongoDB collection to sync.
The collection must exist within the configured database.
The collection must be accessible using the configured username and password.
Direct connection::
Toggle to use the https://www.mongodb.com/docs/ruby-driver/current/reference/create-client/#direct-connection[direct connection option for the MongoDB client^].
Disabled by default.
SSL/TLS Connection::
Toggle to establish a secure connection to the MongoDB server using SSL/TLS encryption.
Ensure that your MongoDB deployment supports SSL/TLS connections.
*Enable* if your MongoDB cluster uses DNS SRV records (namely MongoDB Atlas users).
+
Disabled by default.
Certificate Authority (.pem)::
Specifies the root certificate from the Certificate Authority.
The value of the certificate is used to validate the certificate presented by the MongoDB instance.
[TIP]
====
Atlas users can leave this blank because https://www.mongodb.com/docs/atlas/reference/faq/security/#which-certificate-authority-signs-mongodb-atlas-tls-certificates-[Atlas uses a widely trusted root CA].
====
Skip certificate verification::
Skips various certificate validations (if SSL is enabled).
Disabled by default.
[NOTE]
====
We strongly recommend leaving this option disabled in production environments.
====
[discrete#es-connectors-mongodb-create-native-connector]
===== Create a {service-name} connector
include::_connectors-create-native.asciidoc[]
[discrete#es-connectors-mongodb-usage]
===== Usage
To use this connector as a *managed connector*, use the *Connector* workflow.
See <<es-native-connectors>>.
For additional operations, see <<es-connectors-usage>>.
[discrete#es-connectors-mongodb-example]
===== Example
An example is available for this connector.
See <<es-mongodb-start>>.
[discrete#es-connectors-mongodb-known-issues]
===== Known issues
[discrete#es-connectors-mongodb-known-issues-ssl-tls-812]
====== SSL must be enabled for MongoDB Atlas
* A bug introduced in *8.12.0* causes the connector to fail to sync Mongo *Atlas* urls (`mongo+srv`) unless SSL/TLS is enabled.
// https://github.com/elastic/sdh-enterprise-search/issues/1283#issuecomment-1919731668
[discrete#es-connectors-mongodb-known-issues-expressions-and-variables-in-aggregation-pipelines]
====== Expressions and variables in aggregation pipelines
It's not possible to use expressions like `new Date()` inside an aggregation pipeline.
These expressions won't be evaluated by the underlying MongoDB client, but will be passed as a string to the MongoDB instance.
A possible workaround is to use https://www.mongodb.com/docs/manual/reference/aggregation-variables/[aggregation variables].
Incorrect (`new Date()` will be interpreted as string):
[source,js]
----
{
"aggregate": {
"pipeline": [
{
"$match": {
"expiresAt": {
"$gte": "new Date()"
}
}
}
]
}
}
----
// NOTCONSOLE
Correct (usage of https://www.mongodb.com/docs/manual/reference/aggregation-variables/#mongodb-variable-variable.NOW[$$NOW]):
[source,js]
----
{
"aggregate": {
"pipeline": [
{
"$addFields": {
"current_date": {
"$toDate": "$$NOW"
}
}
},
{
"$match": {
"$expr": {
"$gte": [
"$expiresAt",
"$current_date"
]
}
}
}
]
}
}
----
// NOTCONSOLE
[discrete#es-connectors-mongodb-known-issues-tls-with-invalid-cert]
====== Connecting with self-signed or custom CA TLS Cert
Currently, the MongoDB connector does not support working with self-signed or custom CA certs when connecting to your self-managed MongoDB host.
[WARNING]
====
The following workaround should not be used in production.
====
This can be worked around in development environments, by appending certain query parameters to the configured host.
For example, if your host is `mongodb+srv://my.mongo.host.com`, appending `?tls=true&tlsAllowInvalidCertificates=true` will allow disabling TLS certificate verification.
The full host in this example will look like this:
`mongodb+srv://my.mongo.host.com/?tls=true&tlsAllowInvalidCertificates=true`
See <<es-connectors-known-issues>> for any issues affecting all connectors.
[discrete#es-connectors-mongodb-troubleshooting]
===== Troubleshooting
See <<es-connectors-troubleshooting>>.
[discrete#es-connectors-mongodb-security]
===== Security
See <<es-connectors-security>>.
[discrete#es-connectors-mongodb-syncs]
===== Documents and syncs
The following describes the default syncing behavior for this connector.
Use <<es-sync-rules,sync rules>> and {ref}/ingest-pipeline-search.html[ingest pipelines] to customize syncing for specific indices.
All documents in the configured MongoDB database and collection are extracted and transformed into documents in your Elasticsearch index.
* The connector creates one *Elasticsearch document* for each MongoDB document in the configured database and collection.
* For each document, the connector transforms each MongoDB field into an *Elasticsearch field*.
* For each field, Elasticsearch {ref}/dynamic-mapping.html[dynamically determines the *data type*^].
This results in Elasticsearch documents that closely match the original MongoDB documents.
The Elasticsearch mapping is created when the first document is created.
Each sync is a "full" sync.
For each MongoDB document discovered:
* If it does not exist, the document is created in Elasticsearch.
* If it already exists in Elasticsearch, the Elasticsearch document is replaced and the version is incremented.
* If an existing Elasticsearch document no longer exists in the MongoDB collection, it is deleted from Elasticsearch.
* Embedded documents are stored as an `object` field in the parent document.
This is recursive, because embedded documents can themselves contain embedded documents.
[NOTE]
====
* Files bigger than 10 MB won't be extracted
* Permissions are not synced. All documents indexed to an Elastic deployment will be visible to *all users with access* to that Elastic Deployment.
====
[discrete#es-connectors-mongodb-sync-rules]
===== Sync rules
The following sections describe <<es-sync-rules>> for this connector.
<<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
<<es-sync-rules-advanced,Advanced rules>> for MongoDB can be used to express either `find` queries or aggregation pipelines.
They can also be used to tune options available when issuing these queries/pipelines.
[discrete#es-connectors-mongodb-sync-rules-find]
====== `find` queries
[NOTE]
====
You must create a https://www.mongodb.com/docs/current/core/indexes/index-types/index-text/[text index^] on the MongoDB collection in order to perform text searches.
====
For `find` queries, the structure of this JSON DSL should look like:
[source,js]
----
{
"find":{
"filter": {
// find query goes here
},
"options":{
// query options go here
}
}
}
----
// NOTCONSOLE
For example:
[source,js]
----
{
"find": {
"filter": {
"$text": {
"$search": "garden",
"$caseSensitive": false
}
},
"skip": 10,
"limit": 1000
}
}
----
// NOTCONSOLE
`find` queries also support additional options, for example the `projection` object:
[source,js]
----
{
"find": {
"filter": {
"languages": [
"English"
],
"runtime": {
"$gt":90
}
},
"projection":{
"tomatoes": 1
}
}
}
----
// NOTCONSOLE
Where the available options are:
* `allow_disk_use` (true, false) — When set to true, the server can write temporary data to disk while executing the find operation. This option is only available on MongoDB server versions 4.4 and newer.
* `allow_partial_results` (true, false) — Allows the query to get partial results if some shards are down.
* `batch_size` (Integer) — The number of documents returned in each batch of results from MongoDB.
* `filter` (Object) — The filter criteria for the query.
* `limit` (Integer) — The max number of docs to return from the query.
* `max_time_ms` (Integer) — The maximum amount of time to allow the query to run, in milliseconds.
* `no_cursor_timeout` (true, false) — The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to prevent that.
* `projection` (Array, Object) — The fields to include or exclude from each doc in the result set. If an array, it should have at least one item.
* `return_key` (true, false) — Return index keys rather than the documents.
* `show_record_id` (true, false) — Return the `$recordId` for each doc in the result set.
* `skip` (Integer) — The number of docs to skip before returning results.
[discrete#es-connectors-mongodb-sync-rules-aggregation]
====== Aggregation pipelines
Similarly, for aggregation pipelines, the structure of the JSON DSL should look like:
[source,js]
----
{
"aggregate":{
"pipeline": [
// pipeline elements go here
],
"options": {
// pipeline options go here
}
}
}
----
// NOTCONSOLE
Where the available options are:
* `allowDiskUse` (true, false) — Set to true if disk usage is allowed during the aggregation.
* `batchSize` (Integer) — The number of documents to return per batch.
* `bypassDocumentValidation` (true, false) — Whether or not to skip document level validation.
* `collation` (Object) — The collation to use.
* `comment` (String) — A user-provided comment to attach to this command.
* `hint` (String) — The index to use for the aggregation.
* `let` (Object) — Mapping of variables to use in the pipeline. See the server documentation for details.
* `maxTimeMs` (Integer) — The maximum amount of time in milliseconds to allow the aggregation to run.
[discrete#es-connectors-mongodb-migration-from-ruby]
===== Migrating from the Ruby connector framework
As part of the 8.8.0 release the MongoDB connector was moved from the {connectors-python}[Ruby connectors framework^] to the {connectors-python}[Elastic connector framework^].
This change introduces minor formatting modifications to data ingested from MongoDB:
1. Nested object id field name has changed from "_id" to "id". For example, if you had a field "customer._id", this will now be named "customer.id".
2. Date format has changed from `YYYY-MM-DD'T'HH:mm:ss.fff'Z'` to `YYYY-MM-DD'T'HH:mm:ss`
If your MongoDB connector stopped working after migrating from 8.7.x to 8.8.x, read the workaround outlined in <<es-connectors-known-issues>>.
If that does not work, we recommend deleting the search index attached to this connector and re-creating a MongoDB connector from scratch.
// Closing the collapsible section
===============
// //////// //// //// //// //// //// //// ////////
// //////// CONNECTOR CLIENT REFERENCE ///////
// //////// //// //// //// //// //// //// ////////
[discrete#es-connectors-mongodb-connector-client-reference]
==== *Self-managed connector*
.View *self-managed connector* reference
[%collapsible]
===============
[discrete#es-connectors-mongodb-client-prerequisites]
===== Availability and prerequisites
This connector is also available as a *self-managed connector* from the *Elastic connector framework*.
To use this connector as a self-managed connector, satisfy all <<es-build-connector,self-managed connector requirements>>.
[discrete#es-connectors-mongodb-client-compatibility]
===== Compatibility
This connector is compatible with *MongoDB Atlas* and *MongoDB 3.6 and later*.
The data source and your Elastic deployment must be able to communicate with each other over a network.
[discrete#es-connectors-mongodb-client-configuration]
===== Configuration
[TIP]
====
When using the <<es-build-connector, self-managed connector>> workflow, initially these fields will use the default configuration set in the {connectors-python}/connectors/sources/jira.py[connector source code^].
These are set in the `get_default_configuration` function definition.
These configurable fields will be rendered with their respective *labels* in the Kibana UI.
Once connected, you'll be able to update these values in Kibana.
====
The following configuration fields are required to set up the connector:
`host`::
The URI of the MongoDB host.
Examples:
+
* `mongodb+srv://my_username:my_password@cluster0.mongodb.net/mydb?w=majority`
* `mongodb://127.0.0.1:27017`
`user`::
The MongoDB username the connector will use.
+
The user must have access to the configured database and collection.
You may want to create a dedicated, read-only user for each connector.
`password`::
The MongoDB password the connector will use.
[NOTE]
====
Anonymous authentication is supported for _testing purposes only_, but should not be used in production.
Omit the username and password, to use default values.
====
`database`::
The MongoDB database to sync.
The database must be accessible using the configured username and password.
`collection`::
The MongoDB collection to sync.
The collection must exist within the configured database.
The collection must be accessible using the configured username and password.
`direct_connection`::
Whether to use the https://www.mongodb.com/docs/ruby-driver/current/reference/create-client/#direct-connection[direct connection option for the MongoDB client^].
Default value is `False`.
`ssl_enabled`::
Whether to establish a secure connection to the MongoDB server using SSL/TLS encryption.
Ensure that your MongoDB deployment supports SSL/TLS connections.
*Enable* if your MongoDB cluster uses DNS SRV records (namely MongoDB Atlas users).
+
Default value is `False`.
`ssl_ca`::
Specifies the root certificate from the Certificate Authority.
The value of the certificate is used to validate the certificate presented by the MongoDB instance.
[TIP]
====
Atlas users can leave this blank because https://www.mongodb.com/docs/atlas/reference/faq/security/#which-certificate-authority-signs-mongodb-atlas-tls-certificates-[Atlas uses a widely trusted root CA].
====
`tls_insecure`::
Skips various certificate validations (if SSL is enabled).
Default value is `False`.
[NOTE]
====
We strongly recommend leaving this option disabled in production environments.
====
[discrete#es-connectors-mongodb-create-connector-client]
===== Create a {service-name} connector
include::_connectors-create-client.asciidoc[]
[discrete#es-connectors-mongodb-client-usage]
===== Usage
To use this connector as a *self-managed connector*, see <<es-build-connector>>
For additional usage operations, see <<es-connectors-usage>>.
[discrete#es-connectors-mongodb-client-example]
===== Example
An example is available for this connector.
See <<es-mongodb-start>>.
[discrete#es-connectors-mongodb-client-known-issues]
===== Known issues
[discrete#es-connectors-mongodb-client-known-issues-ssl-tls-812]
====== SSL must be enabled for MongoDB Atlas
* A bug introduced in *8.12.0* causes the connector to fail to sync Mongo *Atlas* urls (`mongo+srv`) unless SSL/TLS is enabled.
// https://github.com/elastic/sdh-enterprise-search/issues/1283#issuecomment-1919731668
[discrete#es-connectors-mongodb-client-known-issues-expressions-and-variables-in-aggregation-pipelines]
====== Expressions and variables in aggregation pipelines
It's not possible to use expressions like `new Date()` inside an aggregation pipeline.
These expressions won't be evaluated by the underlying MongoDB client, but will be passed as a string to the MongoDB instance.
A possible workaround is to use https://www.mongodb.com/docs/manual/reference/aggregation-variables/[aggregation variables].
Incorrect (`new Date()` will be interpreted as string):
[source,js]
----
{
"aggregate": {
"pipeline": [
{
"$match": {
"expiresAt": {
"$gte": "new Date()"
}
}
}
]
}
}
----
// NOTCONSOLE
Correct (usage of https://www.mongodb.com/docs/manual/reference/aggregation-variables/#mongodb-variable-variable.NOW[$$NOW]):
[source,js]
----
{
"aggregate": {
"pipeline": [
{
"$addFields": {
"current_date": {
"$toDate": "$$NOW"
}
}
},
{
"$match": {
"$expr": {
"$gte": [
"$expiresAt",
"$current_date"
]
}
}
}
]
}
}
----
// NOTCONSOLE
[discrete#es-connectors-mongodb-client-known-issues-tls-with-invalid-cert]
====== Connecting with self-signed or custom CA TLS Cert
Currently, the MongoDB connector does not support working with self-signed or custom CA certs when connecting to your self-managed MongoDB host.
[WARNING]
====
The following workaround should not be used in production.
====
This can be worked around in development environments, by appending certain query parameters to the configured host.
For example, if your host is `mongodb+srv://my.mongo.host.com`, appending `?tls=true&tlsAllowInvalidCertificates=true` will allow disabling TLS certificate verification.
The full host in this example will look like this:
`mongodb+srv://my.mongo.host.com/?tls=true&tlsAllowInvalidCertificates=true`
[discrete#es-connectors-mongodb-known-issues-docker-image-fails]
====== Docker image errors out for versions 8.12.0 and 8.12.1
A bug introduced in *8.12.0* causes the Connectors docker image to error out if run using MongoDB as its source.
The command line will output the error `cannot import name 'coroutine' from 'asyncio'`.
** This issue is fixed in versions *8.12.2* and *8.13.0*.
** This bug does not affect Elastic managed connectors.
See <<es-connectors-known-issues>> for any issues affecting all connectors.
[discrete#es-connectors-mongodb-client-troubleshooting]
===== Troubleshooting
See <<es-connectors-troubleshooting>>.
[discrete#es-connectors-mongodb-client-security]
===== Security
See <<es-connectors-security>>.
[discrete#es-connectors-mongodb-client-docker]
===== Deployment using Docker
include::_connectors-docker-instructions.asciidoc[]
[discrete#es-connectors-mongodb-client-syncs]
===== Documents and syncs
The following describes the default syncing behavior for this connector.
Use <<es-sync-rules,sync rules>> and {ref}/ingest-pipeline-search.html[ingest pipelines] to customize syncing for specific indices.
All documents in the configured MongoDB database and collection are extracted and transformed into documents in your Elasticsearch index.
* The connector creates one *Elasticsearch document* for each MongoDB document in the configured database and collection.
* For each document, the connector transforms each MongoDB field into an *Elasticsearch field*.
* For each field, Elasticsearch {ref}/dynamic-mapping.html[dynamically determines the *data type*^].
This results in Elasticsearch documents that closely match the original MongoDB documents.
The Elasticsearch mapping is created when the first document is created.
Each sync is a "full" sync.
For each MongoDB document discovered:
* If it does not exist, the document is created in Elasticsearch.
* If it already exists in Elasticsearch, the Elasticsearch document is replaced and the version is incremented.
* If an existing Elasticsearch document no longer exists in the MongoDB collection, it is deleted from Elasticsearch.
* Embedded documents are stored as an `object` field in the parent document.
This is recursive, because embedded documents can themselves contain embedded documents.
[NOTE]
====
* Files bigger than 10 MB won't be extracted
* Permissions are not synced. All documents indexed to an Elastic deployment will be visible to *all users with access* to that Elastic Deployment.
====
[discrete#es-connectors-mongodb-client-sync-rules]
===== Sync rules
The following sections describe <<es-sync-rules>> for this connector.
<<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
<<es-sync-rules-advanced,Advanced rules>> for MongoDB can be used to express either `find` queries or aggregation pipelines.
They can also be used to tune options available when issuing these queries/pipelines.
[discrete#es-connectors-mongodb-client-sync-rules-find]
====== `find` queries
[NOTE]
====
You must create a https://www.mongodb.com/docs/current/core/indexes/index-types/index-text/[text index^] on the MongoDB collection in order to perform text searches.
====
For `find` queries, the structure of this JSON DSL should look like:
[source,js]
----
{
"find":{
"filter": {
// find query goes here
},
"options":{
// query options go here
}
}
}
----
// NOTCONSOLE
For example:
[source,js]
----
{
"find": {
"filter": {
"$text": {
"$search": "garden",
"$caseSensitive": false
}
},
"skip": 10,
"limit": 1000
}
}
----
// NOTCONSOLE
`find` queries also support additional options, for example the `projection` object:
[source,js]
----
{
"find": {
"filter": {
"languages": [
"English"
],
"runtime": {
"$gt":90
}
},
"projection":{
"tomatoes": 1
}
}
}
----
// NOTCONSOLE
Where the available options are:
* `allow_disk_use` (true, false) — When set to true, the server can write temporary data to disk while executing the find operation. This option is only available on MongoDB server versions 4.4 and newer.
* `allow_partial_results` (true, false) — Allows the query to get partial results if some shards are down.
* `batch_size` (Integer) — The number of documents returned in each batch of results from MongoDB.
* `filter` (Object) — The filter criteria for the query.
* `limit` (Integer) — The max number of docs to return from the query.
* `max_time_ms` (Integer) — The maximum amount of time to allow the query to run, in milliseconds.
* `no_cursor_timeout` (true, false) — The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to prevent that.
* `projection` (Array, Object) — The fields to include or exclude from each doc in the result set. If an array, it should have at least one item.
* `return_key` (true, false) — Return index keys rather than the documents.
* `show_record_id` (true, false) — Return the `$recordId` for each doc in the result set.
* `skip` (Integer) — The number of docs to skip before returning results.
[discrete#es-connectors-mongodb-client-sync-rules-aggregation]
====== Aggregation pipelines
Similarly, for aggregation pipelines, the structure of the JSON DSL should look like:
[source,js]
----
{
"aggregate":{
"pipeline": [
// pipeline elements go here
],
"options": {
// pipeline options go here
}
}
}
----
// NOTCONSOLE
Where the available options are:
* `allowDiskUse` (true, false) — Set to true if disk usage is allowed during the aggregation.
* `batchSize` (Integer) — The number of documents to return per batch.
* `bypassDocumentValidation` (true, false) — Whether or not to skip document level validation.
* `collation` (Object) — The collation to use.
* `comment` (String) — A user-provided comment to attach to this command.
* `hint` (String) — The index to use for the aggregation.
* `let` (Object) — Mapping of variables to use in the pipeline. See the server documentation for details.
* `maxTimeMs` (Integer) — The maximum amount of time in milliseconds to allow the aggregation to run.
[discrete#es-connectors-mongodb-client-migration-from-ruby]
===== Migrating from the Ruby connector framework
As part of the 8.8.0 release the MongoDB connector was moved from the {connectors-python}[Ruby connectors framework^] to the {connectors-python}[Elastic connector framework^].
This change introduces minor formatting modifications to data ingested from MongoDB:
1. Nested object id field name has changed from "_id" to "id". For example, if you had a field "customer._id", this will now be named "customer.id".
2. Date format has changed from `YYYY-MM-DD'T'HH:mm:ss.fff'Z'` to `YYYY-MM-DD'T'HH:mm:ss`
If your MongoDB connector stopped working after migrating from 8.7.x to 8.8.x, read the workaround outlined in <<es-connectors-known-issues>>.
If that does not work, we recommend deleting the search index attached to this connector and re-creating a MongoDB connector from scratch.
// Closing the collapsible section
===============