mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-30 10:23:41 -04:00
291 lines
9.9 KiB
Text
291 lines
9.9 KiB
Text
[#es-connectors-azure-blob]
|
|
=== Elastic Azure Blob Storage connector reference
|
|
++++
|
|
<titleabbrev>Azure Blob Storage</titleabbrev>
|
|
++++
|
|
// Attributes used in this file
|
|
:service-name: Azure Blob Storage
|
|
:service-name-stub: azure_blob_storage
|
|
|
|
The _Elastic Azure Blob Storage connector_ is a <<es-connectors,connector>> for https://azure.microsoft.com/en-us/services/storage/blobs/[Azure Blob Storage^].
|
|
|
|
This connector is written in Python using the {connectors-python}[Elastic connector framework^].
|
|
|
|
View the {connectors-python}/connectors/sources/{service-name-stub}.py[*source code* for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
|
|
|
|
.Choose your connector reference
|
|
*******************************
|
|
Are you using a managed connector on Elastic Cloud or a self-managed connector? Expand the documentation based on your deployment method.
|
|
*******************************
|
|
|
|
// //////// //// //// //// //// //// //// ////////
|
|
// //////// NATIVE CONNECTOR REFERENCE ///////
|
|
// //////// //// //// //// //// //// //// ////////
|
|
|
|
[discrete#es-connectors-azure-blob-native-connector-reference]
|
|
==== *Elastic managed connector reference*
|
|
|
|
.View *Elastic managed connector* reference
|
|
|
|
[%collapsible]
|
|
===============
|
|
|
|
[discrete#es-connectors-azure-blob-availability-prerequisites]
|
|
===== Availability and prerequisites
|
|
|
|
This connector is available as a *managed connector* on Elastic Cloud, as of *8.9.1*.
|
|
|
|
To use this connector natively in Elastic Cloud, satisfy all <<es-native-connectors-prerequisites,managed connector requirements>>.
|
|
|
|
[discrete#es-connectors-azure-blob-compatability]
|
|
===== Compatibility
|
|
|
|
This connector has not been tested with Azure Government.
|
|
Therefore we cannot guarantee that it will work with Azure Government endpoints.
|
|
For more information on Azure Government compared to Global Azure, refer to the
|
|
https://learn.microsoft.com/en-us/azure/azure-government/compare-azure-government-global-azure[official Microsoft documentation^].
|
|
|
|
[discrete#es-connectors-{service-name-stub}-create-native-connector]
|
|
===== Create {service-name} connector
|
|
|
|
include::_connectors-create-native.asciidoc[]
|
|
|
|
[discrete#es-connectors-azure-blob-usage]
|
|
===== Usage
|
|
|
|
To use this connector as a *managed connector*, see <<es-native-connectors>>.
|
|
|
|
For additional operations, see <<es-connectors-usage>>.
|
|
|
|
[discrete#es-connectors-azure-blob-configuration]
|
|
===== Configuration
|
|
|
|
The following configuration fields are required to set up the connector:
|
|
|
|
Account name::
|
|
Name of Azure Blob Storage account.
|
|
|
|
Account key::
|
|
https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal[Account key^] for the Azure Blob Storage account.
|
|
|
|
Blob endpoint::
|
|
Endpoint for the Blob Service.
|
|
|
|
Containers::
|
|
List of containers to index.
|
|
`*` will index all containers.
|
|
|
|
[discrete#es-connectors-azure-blob-documents-syncs]
|
|
===== Documents and syncs
|
|
|
|
The connector will fetch all data available in the container.
|
|
|
|
[NOTE]
|
|
====
|
|
* Content from files bigger than 10 MB won't be extracted. (Self-managed connectors can use the <<es-connectors-content-extraction-local, self-managed local extraction service>> to handle larger binary files.)
|
|
* Permissions are not synced.
|
|
**All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment.
|
|
====
|
|
|
|
[discrete#es-connectors-azure-blob-sync-types]
|
|
====== Sync types
|
|
|
|
<<es-connectors-sync-types-full,Full syncs>> are supported by default for all connectors.
|
|
|
|
This connector also supports <<es-connectors-sync-types-incremental,incremental syncs>>.
|
|
|
|
[discrete#es-connectors-azure-sync-rules]
|
|
===== Sync rules
|
|
|
|
<<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
|
|
|
|
Advanced sync rules are not available for this connector in the present version.
|
|
Currently filtering is controlled via ingest pipelines.
|
|
|
|
[discrete#es-connectors-azure-blob-content-extraction]
|
|
===== Content extraction
|
|
|
|
See <<es-connectors-content-extraction>>.
|
|
|
|
[discrete#es-connectors-azure-blob-known-issues]
|
|
===== Known issues
|
|
|
|
This connector has the following known issues:
|
|
|
|
* *`lease data` and `tier` fields are not updated in Elasticsearch indices*
|
|
+
|
|
This is because the blob timestamp is not updated.
|
|
Refer to https://github.com/elastic/connectors-python/issues/289[Github issue^].
|
|
|
|
[discrete#es-connectors-azure-blob-troubleshooting]
|
|
===== Troubleshooting
|
|
|
|
See <<es-connectors-troubleshooting>>.
|
|
|
|
[discrete#es-connectors-azure-blob-security]
|
|
===== Security
|
|
|
|
See <<es-connectors-security>>.
|
|
|
|
View the {connectors-python}/connectors/sources/azure_blob_storage.py[source code for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_)
|
|
|
|
// Closing the collapsible section
|
|
===============
|
|
|
|
|
|
// //////// //// //// //// //// //// //// ////////
|
|
// //////// CONNECTOR CLIENT REFERENCE ///////
|
|
// //////// //// //// //// //// //// //// ////////
|
|
|
|
[discrete#es-connectors-azure-blob-connector-client-reference]
|
|
==== *Self-managed connector*
|
|
|
|
.View *self-managed connector* reference
|
|
|
|
[%collapsible]
|
|
===============
|
|
|
|
[discrete#es-connectors-azure-blob-client-availability-prerequisites]
|
|
===== Availability and prerequisites
|
|
|
|
This connector is available as a self-managed *self-managed connector*.
|
|
This self-managed connector is compatible with Elastic versions *8.6.0+*.
|
|
To use this connector, satisfy all <<es-build-connector,self-managed connector requirements>>.
|
|
|
|
[discrete#es-connectors-azure-blob-client-compatability]
|
|
===== Compatibility
|
|
|
|
This connector has not been tested with Azure Government.
|
|
Therefore we cannot guarantee that it will work with Azure Government endpoints.
|
|
For more information on Azure Government compared to Global Azure, refer to the
|
|
https://learn.microsoft.com/en-us/azure/azure-government/compare-azure-government-global-azure[official Microsoft documentation^].
|
|
|
|
[discrete#es-connectors-{service-name-stub}-create-connector-client]
|
|
===== Create {service-name} connector
|
|
|
|
include::_connectors-create-client.asciidoc[]
|
|
|
|
[discrete#es-connectors-azure-blob-client-usage]
|
|
===== Usage
|
|
|
|
To use this connector as a *self-managed connector*, see <<es-build-connector>>
|
|
For additional usage operations, see <<es-connectors-usage>>.
|
|
|
|
[discrete#es-connectors-azure-blob-client-configuration]
|
|
===== Configuration
|
|
|
|
[TIP]
|
|
====
|
|
When using the <<es-build-connector, self-managed connector>> workflow, initially these fields will use the default configuration set in the {connectors-python}/connectors/sources/azure_blob_storage.py[connector source code^].
|
|
These are set in the `get_default_configuration` function definition.
|
|
|
|
These configurable fields will be rendered with their respective *labels* in the Kibana UI.
|
|
Once connected, you'll be able to update these values in Kibana.
|
|
====
|
|
|
|
The following configuration fields are required to set up the connector:
|
|
|
|
`account_name`::
|
|
Name of Azure Blob Storage account.
|
|
|
|
`account_key`::
|
|
https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal[Account key^] for the Azure Blob Storage account.
|
|
|
|
`blob_endpoint`::
|
|
Endpoint for the Blob Service.
|
|
|
|
`containers`::
|
|
List of containers to index.
|
|
`*` will index all containers.
|
|
|
|
`retry_count`::
|
|
Number of retry attempts after a failed call.
|
|
Default value is `3`.
|
|
|
|
`concurrent_downloads`::
|
|
Number of concurrent downloads for fetching content.
|
|
Default value is `100`.
|
|
|
|
`use_text_extraction_service`::
|
|
Requires a separate deployment of the <<es-connectors-content-extraction-local,Elastic Text Extraction Service>>. Requires that ingest pipeline settings disable text extraction.
|
|
Default value is `False`.
|
|
|
|
[discrete#es-connectors-azure-blob-client-docker]
|
|
===== Deployment using Docker
|
|
|
|
include::_connectors-docker-instructions.asciidoc[]
|
|
|
|
[discrete#es-connectors-azure-blob-client-documents-syncs]
|
|
===== Documents and syncs
|
|
|
|
The connector will fetch all data available in the container.
|
|
|
|
[NOTE]
|
|
====
|
|
* Content from files bigger than 10 MB won't be extracted by default. You can use the <<es-connectors-content-extraction-local, self-managed local extraction service>> to handle larger binary files.
|
|
* Permissions are not synced.
|
|
**All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment.
|
|
====
|
|
|
|
[discrete#es-connectors-azure-blob-client-sync-types]
|
|
====== Sync types
|
|
|
|
<<es-connectors-sync-types-full,Full syncs>> are supported by default for all connectors.
|
|
|
|
This connector also supports <<es-connectors-sync-types-incremental,incremental syncs>>.
|
|
|
|
[discrete#es-connectors-azure-blob-client-sync-rules]
|
|
===== Sync rules
|
|
|
|
<<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
|
|
|
|
Advanced sync rules are not available for this connector in the present version.
|
|
Currently filtering is controlled via ingest pipelines.
|
|
|
|
[discrete#es-connectors-azure-blob-client-content-extraction]
|
|
===== Content extraction
|
|
|
|
See <<es-connectors-content-extraction>>.
|
|
|
|
[discrete#es-connectors-azure-blob-client-testing]
|
|
===== End-to-end testing
|
|
|
|
The connector framework enables operators to run functional tests against a real data source.
|
|
Refer to <<es-build-connector-testing>> for more details.
|
|
|
|
To perform E2E testing for the Azure Blob Storage connector, run the following command:
|
|
|
|
[source,shell]
|
|
----
|
|
$ make ftest NAME=azure_blob_storage
|
|
----
|
|
|
|
For faster tests, add the `DATA_SIZE=small` flag:
|
|
|
|
[source,shell]
|
|
----
|
|
make ftest NAME=azure_blob_storage DATA_SIZE=small
|
|
----
|
|
|
|
[discrete#es-connectors-azure-blob-client-known-issues]
|
|
===== Known issues
|
|
|
|
This connector has the following known issues:
|
|
|
|
* *`lease data` and `tier` fields are not updated in Elasticsearch indices*
|
|
+
|
|
This is because the blob timestamp is not updated.
|
|
Refer to https://github.com/elastic/connectors/issues/289[Github issue^].
|
|
|
|
[discrete#es-connectors-azure-blob-client-troubleshooting]
|
|
===== Troubleshooting
|
|
|
|
See <<es-connectors-troubleshooting>>.
|
|
|
|
[discrete#es-connectors-azure-blob-client-security]
|
|
===== Security
|
|
|
|
See <<es-connectors-security>>.
|
|
|
|
// Closing the collapsible section
|
|
===============
|