elasticsearch/docs/reference/connector/docs/connectors-azure-blob.asciidoc

291 lines
9.9 KiB
Text

[#es-connectors-azure-blob]
=== Elastic Azure Blob Storage connector reference
++++
<titleabbrev>Azure Blob Storage</titleabbrev>
++++
// Attributes used in this file
:service-name: Azure Blob Storage
:service-name-stub: azure_blob_storage
The _Elastic Azure Blob Storage connector_ is a <<es-connectors,connector>> for https://azure.microsoft.com/en-us/services/storage/blobs/[Azure Blob Storage^].
This connector is written in Python using the {connectors-python}[Elastic connector framework^].
View the {connectors-python}/connectors/sources/{service-name-stub}.py[*source code* for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_).
.Choose your connector reference
*******************************
Are you using a managed connector on Elastic Cloud or a self-managed connector? Expand the documentation based on your deployment method.
*******************************
// //////// //// //// //// //// //// //// ////////
// //////// NATIVE CONNECTOR REFERENCE ///////
// //////// //// //// //// //// //// //// ////////
[discrete#es-connectors-azure-blob-native-connector-reference]
==== *Elastic managed connector reference*
.View *Elastic managed connector* reference
[%collapsible]
===============
[discrete#es-connectors-azure-blob-availability-prerequisites]
===== Availability and prerequisites
This connector is available as a *managed connector* on Elastic Cloud, as of *8.9.1*.
To use this connector natively in Elastic Cloud, satisfy all <<es-native-connectors-prerequisites,managed connector requirements>>.
[discrete#es-connectors-azure-blob-compatability]
===== Compatibility
This connector has not been tested with Azure Government.
Therefore we cannot guarantee that it will work with Azure Government endpoints.
For more information on Azure Government compared to Global Azure, refer to the
https://learn.microsoft.com/en-us/azure/azure-government/compare-azure-government-global-azure[official Microsoft documentation^].
[discrete#es-connectors-{service-name-stub}-create-native-connector]
===== Create {service-name} connector
include::_connectors-create-native.asciidoc[]
[discrete#es-connectors-azure-blob-usage]
===== Usage
To use this connector as a *managed connector*, see <<es-native-connectors>>.
For additional operations, see <<es-connectors-usage>>.
[discrete#es-connectors-azure-blob-configuration]
===== Configuration
The following configuration fields are required to set up the connector:
Account name::
Name of Azure Blob Storage account.
Account key::
https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal[Account key^] for the Azure Blob Storage account.
Blob endpoint::
Endpoint for the Blob Service.
Containers::
List of containers to index.
`*` will index all containers.
[discrete#es-connectors-azure-blob-documents-syncs]
===== Documents and syncs
The connector will fetch all data available in the container.
[NOTE]
====
* Content from files bigger than 10 MB won't be extracted. (Self-managed connectors can use the <<es-connectors-content-extraction-local, self-managed local extraction service>> to handle larger binary files.)
* Permissions are not synced.
**All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment.
====
[discrete#es-connectors-azure-blob-sync-types]
====== Sync types
<<es-connectors-sync-types-full,Full syncs>> are supported by default for all connectors.
This connector also supports <<es-connectors-sync-types-incremental,incremental syncs>>.
[discrete#es-connectors-azure-sync-rules]
===== Sync rules
<<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
Advanced sync rules are not available for this connector in the present version.
Currently filtering is controlled via ingest pipelines.
[discrete#es-connectors-azure-blob-content-extraction]
===== Content extraction
See <<es-connectors-content-extraction>>.
[discrete#es-connectors-azure-blob-known-issues]
===== Known issues
This connector has the following known issues:
* *`lease data` and `tier` fields are not updated in Elasticsearch indices*
+
This is because the blob timestamp is not updated.
Refer to https://github.com/elastic/connectors-python/issues/289[Github issue^].
[discrete#es-connectors-azure-blob-troubleshooting]
===== Troubleshooting
See <<es-connectors-troubleshooting>>.
[discrete#es-connectors-azure-blob-security]
===== Security
See <<es-connectors-security>>.
View the {connectors-python}/connectors/sources/azure_blob_storage.py[source code for this connector^] (branch _{connectors-branch}_, compatible with Elastic _{minor-version}_)
// Closing the collapsible section
===============
// //////// //// //// //// //// //// //// ////////
// //////// CONNECTOR CLIENT REFERENCE ///////
// //////// //// //// //// //// //// //// ////////
[discrete#es-connectors-azure-blob-connector-client-reference]
==== *Self-managed connector*
.View *self-managed connector* reference
[%collapsible]
===============
[discrete#es-connectors-azure-blob-client-availability-prerequisites]
===== Availability and prerequisites
This connector is available as a self-managed *self-managed connector*.
This self-managed connector is compatible with Elastic versions *8.6.0+*.
To use this connector, satisfy all <<es-build-connector,self-managed connector requirements>>.
[discrete#es-connectors-azure-blob-client-compatability]
===== Compatibility
This connector has not been tested with Azure Government.
Therefore we cannot guarantee that it will work with Azure Government endpoints.
For more information on Azure Government compared to Global Azure, refer to the
https://learn.microsoft.com/en-us/azure/azure-government/compare-azure-government-global-azure[official Microsoft documentation^].
[discrete#es-connectors-{service-name-stub}-create-connector-client]
===== Create {service-name} connector
include::_connectors-create-client.asciidoc[]
[discrete#es-connectors-azure-blob-client-usage]
===== Usage
To use this connector as a *self-managed connector*, see <<es-build-connector>>
For additional usage operations, see <<es-connectors-usage>>.
[discrete#es-connectors-azure-blob-client-configuration]
===== Configuration
[TIP]
====
When using the <<es-build-connector, self-managed connector>> workflow, initially these fields will use the default configuration set in the {connectors-python}/connectors/sources/azure_blob_storage.py[connector source code^].
These are set in the `get_default_configuration` function definition.
These configurable fields will be rendered with their respective *labels* in the Kibana UI.
Once connected, you'll be able to update these values in Kibana.
====
The following configuration fields are required to set up the connector:
`account_name`::
Name of Azure Blob Storage account.
`account_key`::
https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal[Account key^] for the Azure Blob Storage account.
`blob_endpoint`::
Endpoint for the Blob Service.
`containers`::
List of containers to index.
`*` will index all containers.
`retry_count`::
Number of retry attempts after a failed call.
Default value is `3`.
`concurrent_downloads`::
Number of concurrent downloads for fetching content.
Default value is `100`.
`use_text_extraction_service`::
Requires a separate deployment of the <<es-connectors-content-extraction-local,Elastic Text Extraction Service>>. Requires that ingest pipeline settings disable text extraction.
Default value is `False`.
[discrete#es-connectors-azure-blob-client-docker]
===== Deployment using Docker
include::_connectors-docker-instructions.asciidoc[]
[discrete#es-connectors-azure-blob-client-documents-syncs]
===== Documents and syncs
The connector will fetch all data available in the container.
[NOTE]
====
* Content from files bigger than 10 MB won't be extracted by default. You can use the <<es-connectors-content-extraction-local, self-managed local extraction service>> to handle larger binary files.
* Permissions are not synced.
**All documents** indexed to an Elastic deployment will be visible to **all users with access** to that Elastic Deployment.
====
[discrete#es-connectors-azure-blob-client-sync-types]
====== Sync types
<<es-connectors-sync-types-full,Full syncs>> are supported by default for all connectors.
This connector also supports <<es-connectors-sync-types-incremental,incremental syncs>>.
[discrete#es-connectors-azure-blob-client-sync-rules]
===== Sync rules
<<es-sync-rules-basic,Basic sync rules>> are identical for all connectors and are available by default.
Advanced sync rules are not available for this connector in the present version.
Currently filtering is controlled via ingest pipelines.
[discrete#es-connectors-azure-blob-client-content-extraction]
===== Content extraction
See <<es-connectors-content-extraction>>.
[discrete#es-connectors-azure-blob-client-testing]
===== End-to-end testing
The connector framework enables operators to run functional tests against a real data source.
Refer to <<es-build-connector-testing>> for more details.
To perform E2E testing for the Azure Blob Storage connector, run the following command:
[source,shell]
----
$ make ftest NAME=azure_blob_storage
----
For faster tests, add the `DATA_SIZE=small` flag:
[source,shell]
----
make ftest NAME=azure_blob_storage DATA_SIZE=small
----
[discrete#es-connectors-azure-blob-client-known-issues]
===== Known issues
This connector has the following known issues:
* *`lease data` and `tier` fields are not updated in Elasticsearch indices*
+
This is because the blob timestamp is not updated.
Refer to https://github.com/elastic/connectors/issues/289[Github issue^].
[discrete#es-connectors-azure-blob-client-troubleshooting]
===== Troubleshooting
See <<es-connectors-troubleshooting>>.
[discrete#es-connectors-azure-blob-client-security]
===== Security
See <<es-connectors-security>>.
// Closing the collapsible section
===============