Skip to main content
For high-volume applications, you can control the percentage of data that is ingested using sampling. This helps manage costs while still capturing representative data and ensuring critical information always gets through.

How Sampling Works

Both /messages and /events endpoints support sampling via two fields:
FieldTypeDescription
sampleRatenumber (0-1)The probability that this request will be ingested. 0.1 means 10% of requests are stored. Defaults to 1.0 (all requests).
forceSamplebooleanWhen true, bypasses sampling and ensures the request is always ingested.
Free plan customers are not affected by sampling. All data is ingested regardless of the sampleRate value.

Messages vs. Events

Sampling works differently for messages and events to optimize for their distinct use cases:

Messages: Conversation-Level Sampling

For /messages, sampling is deterministic per conversation. Either all messages in a conversation are ingested or none are. This ensures:
  • Complete conversation context is preserved
  • No fragmented conversations with missing messages
  • Consistent behavior across retries and multiple message batches
# All messages for this conversation will be kept or dropped together
curl --request POST \
  --url https://www.greenflash.ai/api/v1/messages \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "externalConversationId": "conv-123",
    "productId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "messages": [{"role": "user", "content": "Hello"}],
    "sampleRate": 0.1
  }'

Events: Per-Event Sampling

For /events, sampling is non-deterministic per event. Each event has an independent probability of being ingested. This ensures:
  • Even distribution across all organizations and event types
  • No entire buckets (org, eventType) are permanently included or excluded
  • Representative sampling across your entire event stream
# Each event is independently sampled
curl --request POST \
  --url https://www.greenflash.ai/api/v1/events \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "eventType": "feature_used",
    "productId": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "value": "search",
    "sampleRate": 0.1
  }'

When to Use Sampling

High-Frequency Data

If you’re tracking high-volume data like general usage events or routine conversations, sample at 10-20% to capture trends without storing every occurrence.
# Sample 10% of routine feature usage
{
  "eventType": "page_view",
  "sampleRate": 0.1
}

Critical Data

Always use forceSample: true for high-value data that should never be dropped:
  • Events: purchase_completed, subscription_started, churn_detected
  • Messages: Support escalations, error conversations, VIP customer interactions
# Always capture purchase events
{
  "eventType": "purchase_completed",
  "influence": "positive",
  "value": "299.00",
  "valueType": "currency",
  "forceSample": true
}

Response for Dropped Requests

When a request is dropped due to sampling, the API returns a 204 No Content response. This indicates the request was valid but intentionally not processed. Your integration should handle this gracefully:
const response = await fetch('https://www.greenflash.ai/api/v1/events', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    eventType: 'feature_used',
    productId: productId,
    sampleRate: 0.1
  })
});

if (response.status === 204) {
  // Request was valid but dropped due to sampling - this is expected
  console.log('Event sampled out');
} else if (response.ok) {
  // Event was ingested
  const data = await response.json();
  console.log('Event created:', data.eventId);
}

Best Practices

  1. Start with 100% sampling during development and initial rollout to ensure your integration is working correctly.
  2. Reduce sampling gradually as volume increases. Monitor your analytics to ensure you’re still capturing representative data.
  3. Never sample critical business events. Use forceSample: true for events that directly impact revenue or customer success metrics.
  4. Consider conversation importance when sampling messages. High-value customer conversations or error scenarios should use forceSample: true.
  5. Monitor the 204 response rate to verify your sampling is working as expected.