[HTTP] Follow up on dev doc additions for terraform-friendly HTTP APIs (#225317)

## Summary Follow up PR https://github.com/elastic/kibana/pull/224348 Expanded original document with 3 new sections <img width="753" alt="Screenshot 2025-06-25 at 17 13 13" src="https://github.com/user-attachments/assets/9ccb5da4-dbd0-4c35-bd76-3b4d5cd7fa2f" /> <img width="760" alt="Screenshot 2025-06-25 at 17 13 19" src="https://github.com/user-attachments/assets/32ba114a-e50d-4e38-9a0d-f62dc14f988b" /> (We can consider deleting this last section, as I'm not sure it'll be worth it) <img width="756" alt="Screenshot 2025-06-25 at 17 13 28" src="https://github.com/user-attachments/assets/143666aa-78fa-42ab-880a-f5428ae4183f" /> --------- Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co> Co-authored-by: florent-leborgne <florent.leborgne@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-06-27 18:51:07 -04:00 · 2025-06-27 12:38:49 +02:00 · 2025-06-27 12:38:49 +02:00 · 506db10ae9
commit 506db10ae9
parent e1cbc3c9f6
1 changed files with 362 additions and 20 deletions
--- a/dev_docs/contributing/kibana_http_api_guidelines.mdx
+++ b/dev_docs/contributing/kibana_http_api_guidelines.mdx
@ -19,7 +19,7 @@ Terraform can work with any API, but some APIs are easier to deal with than othe

 ### Think in terms of resources

-APIs that stick to a consistent set of arguments or parameters are much easier to map into Terraform. Terraform can describe the "thing" your API is managing. RPC-based APIs are harder to work with since they’re not declarative. Your HTTP APIs should describe REST-like actions (GET, POST, DELETE, etc.) against resources not remote procedures like: executeJob. 
+APIs that stick to a consistent set of arguments or parameters are much easier to map into Terraform. Terraform can describe the "thing" your API is managing. RPC-based APIs are harder to work with since they’re not declarative. Your HTTP APIs should describe REST-like actions (GET, POST, DELETE, etc.) against resources not remote procedures like: executeJob.

 APIs designed around resources are easier to support than those focused on action-oriented endpoints.

@ -99,16 +99,140 @@ resource "elasticstack_elasticsearch_index_lifecycle" "my_ilm" {
 }
 ```

+### Implement complete CRUD operations
+
+For Terraform to properly manage resources, your API must implement a complete set of CRUD operations on each resource. This is essential for the Terraform lifecycle (create, read, update, delete) to work properly.
+
+#### Required endpoints for each resource
+
+For every resource type, implement these HTTP endpoints:
+
+```
+GET    /api/resource/{id}         # Read - retrieve an existing resource
+POST   /api/resource              # Create - create a new resource
+PUT    /api/resource/{id}         # Update - update an existing resource
+DELETE /api/resource/{id}         # Delete - remove an existing resource
+GET    /api/resource              # List - retrieve all resources (with pagination)
+```
+
+<DocCallOut title="Terraform import">
+  The "List" operation is critical for Terraform's data source and import functionality ([see the docs](https://developer.hashicorp.com/terraform/cli/import)).
+</DocCallOut>
+
+#### Implementation considerations
+
+1. **Consistent response structures**: Ensure GET and POST/PUT responses return the same structure with identical fields.
+
+2. **Idempotent operations**: POST for creation and PUT for updates should be idempotent - running the same request multiple times should result in the same state. This is related to read/GETs without side-effects.
+
+3. **Complete state**: After each operation, return the complete state of the resource, not just acknowledgment.
+
+   ```json
+   // Good - returns complete state
+   {
+     "id": "my-resource", // it is best to call this field "id"
+     "name": "My Resource",
+     "config": { "setting1": "value1" },
+     "created_at": "2025-06-24T08:15:30Z",
+     "updated_at": "2025-06-24T08:15:30Z"
+   }
+
+   // Bad - returns only acknowledgment
+   {
+     "result": "success",
+     "message": "Resource created successfully"
+   }
+   ```
+
+4. **HTTP status codes**: Use appropriate HTTP status codes:
+   - `200/201` for successful operations
+   - `404` when a resource doesn't exist
+   - `409` for conflicts (e.g., resource already exists)
+   - `400` for validation errors
+
+5. **Validation**: Validate input at creation/update time and return comprehensive errors.
+
+#### Example: Complete API for a resource
+
+```typescript
+// Retrieve 1 resource
+router.get(
+  {
+    path: '/api/resource/{id}',
+    validate: false, // `get` should not have bodies
+  },
+  async (context, request, response) => {
+    // Implementation...
+    return response.ok({ body: completeResourceState });
+  }
+);
+
+// Create, update
+router.(post|put)(
+  {
+    path: '/api/resource',
+    validate: {
+      body: schema.object({
+        name: schema.string(),
+        config: schema.object({...}),
+      }),
+    },
+  },
+  async (context, request, response) => {
+    // Implementation...
+    return response.ok({ body: completeResourceState });
+  }
+);
+
+// Delete
+router.delete(
+  {
+    path: '/api/resource/{id}',
+    validate: {
+      params: schema.object({
+        id: schema.string(),
+      }),
+    },
+  },
+  async (context, request, response) => {
+    // Implementation...
+    return response.ok();
+  }
+);
+
+// List many resources
+router.get(
+  {
+    path: '/api/resource',
+    validate: {
+      query: schema.object({
+        page: schema.maybe(schema.number()),
+        perPage: schema.maybe(schema.number()),
+      }),
+    },
+  },
+  async (context, request, response) => {
+    // Implementation...
+    return response.ok({
+      body: {
+        items: resources,
+        total: totalCount,
+        page: currentPage,
+        perPage: itemsPerPage
+      }
+    });
+  }
+);
+```

 #### API imposed challenges

-API design decisions can create real challenges for Terraform resource implementation. The Elasticsearch [Index Settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-settings) (\`PUT /\{index\}/\_settings\`) treats static settings (which require index recreation) and dynamic settings (which can be updated in place) identically. This creates a complex situation forcing the implementation to handle all settings together because the API doesn't offer a way to distinguish them upfront. 
+API design decisions can create real challenges for Terraform resource implementation. The Elasticsearch [Index Settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-settings) (\`PUT /\{index\}/\_settings\`) treats static settings (which require index recreation) and dynamic settings (which can be updated in place) identically. This creates a complex situation forcing the implementation to handle all settings together because the API doesn't offer a way to distinguish them upfront.

 ```bash
 {
  "error": {
    "reason": "Can't update non dynamic settings [[index.codec, index.number_of_shards]] for open indices"
-  
 }
 ```

@ -116,6 +240,100 @@ API design decisions can create real challenges for Terraform resource implement

 A better API design would separate these concerns with different endpoints or provide metadata about which settings require recreation.

+### Handle resource references and dependencies
+
+Terraform configurations frequently define multiple resources that depend on each other. Your API design needs to account for these interdependencies in a way that enables Terraform's declarative model to work smoothly.
+
+#### Use stable, predictable resource identifiers
+
+Resource references must use identifiers that remain stable throughout a resource's lifecycle and across operations:
+
+```json
+// Resource reference using stable ID
+{
+  "name": "My Report",
+  "space_id": "marketing",            // Reference to a space by ID
+  "visualization_ids": ["vis-123"]    // References to visualizations by ID
+}
+```
+
+Note: provide users with a way to choose their own ID when creating resources, otherwise use auto-generated UUIDs.
+
+#### Support referencing resources by identifiers, not just names
+
+Ensure your API accepts references by ID, not just by name or other properties that may change:
+
+```typescript
+// Good: Reference by ID
+router.post(
+  {
+    path: '/api/alerting/rule',
+    validate: {
+      body: schema.object({
+        name: schema.string(),
+        space_id: schema.string(),       // Space ID reference
+        actions: schema.arrayOf(schema.object({
+          connector_id: schema.string(), // Connector ID reference
+          group: schema.string(),
+          // etc.
+        }))
+      })
+    }
+  },
+  handler
+);
+```
+
+#### Use consistent reference patterns across APIs
+
+Apply the same patterns for referencing resources across all your APIs:
+
+1. **Consistent naming**: Use the same suffix for IDs (e.g., `space_id`, `dashboard_id`, etc.).
+2. **Consistent structure**: Use the same structure for referencing resources in all APIs.
+3. **Consistent validation**: Apply the same validation rules across all APIs.
+
+#### Example: Resource with dependencies
+
+```typescript
+// Generic object API that references other objects and spaces
+router.post(
+  {
+    path: '/api/objects/object',
+    validate: {
+      body: schema.object({
+        title: schema.string(),
+        description: schema.maybe(schema.string()),
+        // Reference to space by ID
+        space_id: schema.string(),
+        // References to other objects by ID
+        other_objects: schema.arrayOf(
+          schema.object({
+            object_id: schema.string(),
+          })
+        )
+      })
+    }
+  },
+  async (context, request, response) => {
+    // Implementation with dependency validation
+    const validationResult = await validateDependencies(request);
+    if (!validationResult.valid) {
+      return response.badRequest({
+        body: {
+          message: 'Validation failed',
+          validation: {
+            dependencies: validationResult.errors
+          }
+        }
+      });
+    }
+
+    // Proceed with creation
+    // ...
+  }
+);
+```
+
 ### Design human-friendly APIs

 APIs with clear, descriptive field names and logical resource structures make both manual testing and Terraform resource development much smoother.
@ -152,8 +370,8 @@ mappings = jsonencode({
 })
 ```

-The hickup here is with validation. Terraform just isn't designed to do validation (or testing!) like we would with conventional languages. 
-It doesn’t validate what strings contain, only that they're parsable during terraform plan. 
+The hickup here is with validation. Terraform just isn't designed to do validation (or testing!) like we would with conventional languages.
+It doesn’t validate what strings contain, only that they're parsable during terraform plan.
 Resource developers often fall back to the API layer for validation checks, that can lead to plan/apply issues.

 ### Return errors early—and all at once
@ -222,7 +440,7 @@ OpenAPI specifications make generating consistent Kibana and Fleet clients possi

 ```bash
 # From Makefile - generates standardized clients
-internal/clients/fleet/: 
+internal/clients/fleet/:
 	oapi-codegen -package fleet fleet-openapi.json
 ```

@ -264,9 +482,9 @@ Terraform import brings existing resources under management by reading their cur

 ### Be predictable

-Terraform relies on the same input resulting in the same output.  Avoid using fields like "last modified time"—Terraform can’t compare those meaningfully. This can be tricky with sensitive fields or when your API relies on randomness.  
+Terraform relies on the same input resulting in the same output.  Avoid using fields like "last modified time"—Terraform can’t compare those meaningfully. This can be tricky with sensitive fields or when your API relies on randomness.

-Constantly changing fields in API responses create a difficult situation for Terraform implementations because they cause constant drift. 
+Constantly changing fields in API responses create a difficult situation for Terraform implementations because they cause constant drift.

 Timestamps and other time-based fields aren’t predictable:

@ -283,7 +501,7 @@ The [Alerting Rules API](https://www.elastic.co/docs/api/doc/kibana/operation/op

 ```
 "execution_status": {
-  "status": "active", 
+  "status": "active",
  "last_execution_date": "2023-12-07T22:36:41.358Z",  // Changes on every read
  "last_duration": 736
 }
@ -291,7 +509,7 @@ The [Alerting Rules API](https://www.elastic.co/docs/api/doc/kibana/operation/op

 Users see Terraform detecting "changes" on every refresh, even when nothing in their configuration has actually changed\!

-An alternative is to separate volatile runtime data from stable configuration data in responses: 
+An alternative is to separate volatile runtime data from stable configuration data in responses:

 ```json
 {
@ -307,7 +525,7 @@ An alternative is to separate volatile runtime data from stable configuration da
 }
 ```

-Volatile fields create a challenging choice: include them (causing constant configuration drift) or exclude them (losing valuable runtime information). 
+Volatile fields create a challenging choice: include them (causing constant configuration drift) or exclude them (losing valuable runtime information).

 Including and separating them makes it easier for the TF provider to specifically ignore sets of fields (marking them as [readonly](https://registry.terraform.io/providers/elastic/elasticstack/latest/docs/resources/kibana_alerting_rule#read-only)) but needs to be built into the client generator.

@ -325,13 +543,13 @@ resource "elasticstack_transform" "example" {
  group_by = "field_name"  # String type
 }

-# Version 2.0 - same field became an array  
+# Version 2.0 - same field became an array
 resource "elasticstack_transform" "example" {
  group_by = ["field_name"]  # Array type - BREAKING CHANGE!
 }
 ```

-Another example is when the Kibana SLO API changed `group_by` from `string` to `string | string[]`. 
+Another example is when the Kibana SLO API changed `group_by` from `string` to `string | string[]`.

 Changing a field to make it more lenient might look harmless on the surface but means that the API will return values clients don't expect.

@ -363,7 +581,6 @@ resource "elasticstack_elasticsearch_index" "example" {
  # Structured fields for simple, predictable values
  number_of_shards   = 1
  number_of_replicas = 2
-  
  # JSON strings only for truly complex, variable structures
  mappings = jsonencode({
    properties = {
@ -396,9 +613,9 @@ Consider the scenario:

 A user wants to create alerting rules with different configurations, depending on the space they are in.

-In Kibana’s UI, they’d create a space, change into that space and then create the rule.  
+In Kibana’s UI, they’d create a space, change into that space and then create the rule.

-Through curl, the flow would be similar, although they’d probably GET the space to make sure it exists before POSTing the rule. 
+Through curl, the flow would be similar, although they’d probably GET the space to make sure it exists before POSTing the rule.

 In terraform, it’s different. Both resources can (and often are) configured at the same time! In the scenario above, the client [needs the space ID](https://github.com/elastic/terraform-provider-elasticstack/blame/0826f4385a29fa58a79f785249f204e13ee3d47c/internal/clients/kibana/alerting.go#L192) to create the rule:

@ -434,7 +651,7 @@ private async persistAlertsHelper() {

 ### Offer compare-and-swap

-Between refreshing state and applying changes, there’s a gap where someone could modify your API. Generally it's ok to take the approach of last-write-wins. 
+Between refreshing state and applying changes, there’s a gap where someone could modify your API. Generally it's ok to take the approach of last-write-wins.

 If you have to consider concurrency control, support mechanisms like ETags and checksums, and the \`version\` property on saved objects.

@ -442,13 +659,12 @@ If you have to consider concurrency control, support mechanisms like ETags and c

 The robustness principle says to accept input in various formats but return output in a consistent format, for example converting strings to lowercase, ordering elements in a list or changing whitespace in \`json\` strings.

-For Terraform, don’t do this. 
+For Terraform, don’t do this.

 Terraform does byte-for-byte comparisons, so normalized output forces provider developers to implement logic to handle different data formats, ordering, capitalization variations and other unnecessary complexity.

 Fortunately, the kibana client hasn’t needed to handle complex normalization yet. Let’s keep it that way, return data exactly as you received it.
-
-Follow these tips, and you’ll make your API a dream to work with for Terraform provider developers.
+Follow these tips, and you'll make your API a dream to work with for Terraform provider developers.

 ### Example

@ -500,3 +716,129 @@ resource "elasticstack_kibana_space" "example" {
 }
 ```

+### (Optional) Support bulk operations
+
+Terraform configurations may manage many resources at once. When operating at scale, API efficiency may be a crucial factor. Supporting bulk operations allows Terraform providers to optimize performance and provide a better experience for users managing many resources.
+
+<DocCallOut title="Reach out to Core for guidance" color="warning">
+  You may not need to implement bulk operations. Using bulk APIs may require additional work from
+  the Terraform provider implementation or you need to carefully design your bulk API to work well as Terraform configuration.
+</DocCallOut>
+
+#### Why bulk operations matter for Terraform
+
+When a Terraform configuration manages hundreds of similar resources, individual API calls for each resource can lead to:
+
+- **Performance bottlenecks**: Each HTTP request adds network latency
+- **Rate limiting issues**: Many sequential calls may trigger rate limiting
+
+#### Implement efficient bulk endpoints
+
+Add these bulk operation endpoints to complement your individual resource operations:
+
+```
+POST   /api/resource/_bulk_create   # Create multiple resources
+POST   /api/resource/_bulk_update   # Update multiple resources
+POST   /api/resource/_bulk_delete   # Delete multiple resources
+```
+
+#### Design principles for bulk APIs
+
+1. **Atomic operations**: If one operation fails, provide clear options:
+   - Allow partial success with detailed reports
+   - Support all-or-nothing transactions when needed
+
+2. **Consistent response format**: Return individual status for each item in the batch:
+
+   ```json
+   {
+     "items": [
+       {
+         "id": "resource-1",
+         "status": "created",
+         "result": { /* complete resource state */ }
+       },
+       {
+         "id": "resource-2",
+         "status": "error",
+         "error": { "message": "Validation failed", "code": 400 }
+       }
+     ],
+     "took": 42,
+     "errors": true
+   }
+   ```
+
+3. **Reasonable batch sizes**: Document recommended batch sizes and implement server-side limits
+
+4. **Idempotency**: Ensure bulk operations are idempotent for retry safety
+
+5. **Consistent error handling**: Provide detailed errors for each item in the batch
+
+#### Example: Bulk resource creation
+
+```typescript
+// Bulk create resources
+router.post(
+  {
+    path: '/api/resources/_bulk_create',
+    validate: {
+      body: schema.arrayOf(
+        schema.object({
+          id: schema.string(), // Client-specified ID
+          title: schema.string(),
+          type: schema.string(),
+          space_id: schema.string(),
+          // etc.
+        })
+      )
+    }
+  },
+  async (context, request, response) => {
+    const visualizations = request.body;
+    const results = [];
+
+    // Process each item in the batch
+    for (const viz of visualizations) {
+      try {
+        const result = await createVisualization(viz);
+        results.push({
+          id: viz.id,
+          status: 'created',
+          result
+        });
+      } catch (error) {
+        results.push({
+          id: viz.id,
+          status: 'error',
+          error: {
+            message: error.message,
+            code: error.statusCode || 500
+          }
+        });
+      }
+    }
+
+    const hasErrors = results.some(item => item.status === 'error');
+
+    return response.ok({
+      body: {
+        items: results,
+        took: Date.now() - request.info.received,
+        errors: hasErrors
+      }
+    });
+  }
+);
+```
+
+#### Optimizations for Terraform providers
+
+When implementing bulk operations, consider these Terraform-specific optimizations:
+
+1. **Batch size recommendations**: Document optimal batch sizes to help provider developers maximize performance
+
+2. **Error mapping**: Design error responses that map cleanly to Terraform error handling patterns
+
+3. **State synchronization**: Include complete resource state in responses to avoid additional GET requests
+