qryn: polyglot monitoring and observability

Tail Sampling with Otel + Gigapipe

Alex Maitland — Tue, 04 Feb 2025 21:56:22 GMT

In modern observability, capturing and storing every trace can quickly become impractical due to storage costs and noise from less relevant data. Tail sampling is a powerful technique that enables smarter trace retention by evaluating full traces before deciding whether to keep them. Let’s see how we can leverage this in combination the gigapipe polyglot stack.

What is Tail Sampling?

Unlike head-based sampling, which makes decisions at the start of a trace, tail sampling occurs after a trace is completed. This approach provides richer context, ensuring important traces—such as slow requests, errors, or specific customer interactions—are retained for analysis. The OpenTelemetry Collector supports tail sampling through its tailsamplingprocessor, allowing for advanced filtering and retention policies.

Leveraging qryn as an OpenTelemetry Receiver

qryn is a high-performance observability backend, acts as a native OpenTelemetry receiver, ingesting traces, logs, and metrics offering native LogQL, PromQL and Tempo compatibility.

By integrating the OpenTelemetry Collector with qryn, organizations can benefit from a seamless pipeline where tail sampling decisions are made before storing data in qryn. This setup optimizes both storage efficiency and query performance.

Configuring Tail Sampling with qryn

To enable tail sampling with qryn and OpenTelemetry, follow these key steps:

Deploy OpenTelemetry Collector – Ensure your collector is set up to receive traces from applications and forward them to qryn.
Enable the Tail Sampling Processor – Define sampling rules in your otel-collector-config.yaml, such as retaining traces based on status codes, duration, or custom attributes.
Export to qryn – Configure the collector to send selected traces to qryn’s OpenTelemetry-compatible API.

Example OpenTelemetry Collector Configuration:

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  tailsampling:
    decision_wait: 10s
    policies:
      - name: error_traces
        type: status_code
        status_code:
          status_codes: [ERROR]
      - name: long_traces
        type: latency
        latency:
          threshold_ms: 1000

exporters:
  otlphttp:
    endpoint: "http://qryn-gigapipe/api/v1/traces"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tailsampling]
      exporters: [otlphttp]

Benefits of Tail Sampling

Reduced Storage Costs – By retaining only high-value traces, organizations can significantly cut down on observability storage expenses.
Improved Query Performance – Less noise in the dataset leads to faster and more meaningful trace analysis.
Enhanced Decision-Making – Tail sampling enables intelligent data retention, keeping critical issues and performance bottlenecks visible.

It’s that simple.

By combining OpenTelemetry’s tail sampling capabilities with qryn’s scalable and efficient backend, teams can fine-tune their observability pipelines for optimal performance and cost-effectiveness. Implementing tail sampling ensures that only the most relevant traces are retained, enabling deeper insights and better troubleshooting without unnecessary data overload.

Sign up for a free trial account at Gigapipe. Bring your own OTEL Logs, Metrics and Traces to enjoy our truly polyglot observability platform.

https://gigapipe.com/

The Hidden Costs of Cloud Observability: Why Gigapipe Stands Out

Alex Maitland — Mon, 20 Jan 2025 08:22:40 GMT

As observability becomes a cornerstone of modern infrastructure, organizations are turning to platforms like Grafana Cloud and Datadog to monitor logs, metrics, and traces. However, these tools often come with hidden complexities, not just in their functionality but also in their pricing models. At Gigapipe, we’ve built an alternative that prioritizes simplicity and affordability, ensuring that observability is accessible to all teams without breaking the bank.

Vantage recently released a detailed article comparing the complex costing models of both Datadog and Grafana cloud: https://www.vantage.sh/blog/datadog-vs-grafana-cost

It highlights the pitfalls many companies fall into with the ‘cheap entry’ or ‘free credits’ offerings which can end up vendor locking you with spiralling costs.

The Challenges of Cloud Observability

Cloud observability platforms promise seamless insights into your systems, but the reality is often more complicated. Here are the three primary challenges we see in the industry:

1. Overwhelming Complexity

Many observability tools are packed with features, but this often leads to steep learning curves or more often than not, a system you end up using only a fraction of, but paying for all of. Configuring alerts, managing dashboards, and integrating with existing tools can become a daunting task, especially for smaller teams.

2. Performance Trade-offs

Running observability platforms in the cloud means relying on their infrastructure. This can lead to performance bottlenecks, especially during high-traffic periods or when handling large datasets. For example, in Grafana Cloud, users often report delays in rendering dashboards or querying data during peak usage as you cannot control the power of the infrastructure you’re using.

3. Opaque Pricing Models

The most significant pain point is pricing. Platforms like Datadog and Grafana Cloud often employ complex, usage-based pricing models that make it hard to predict costs. The Vantage blog post highlights how these models can spiral out of control, especially for growing businesses. From per-user fees to charges based on data ingestion, storage, and query volume, the pricing structure often feels more like a labyrinth than a transparent system.

The Gigapipe Difference

At Gigapipe, we’ve reimagined cloud observability with a focus on simplicity and fairness. Here’s how we stand out:

1. Simple, Transparent Pricing

Our pricing model is straightforward: one flat rate based on your data volume. No hidden fees, no surprises. Whether you’re a Startup monitoring a few services or an enterprise handling terabytes of data, you’ll know exactly what you’re paying. Check out our pricing page for details: Gigapipe Pricing.

2. Superior Performance

Gigapipe’s architecture leverages its own open-source observability suite (think of it as if Grafana combined all their different tools and correlated it all for you in a single database), ensuring high-speed data processing and real-time insights. By optimizing our platform for performance, we’ve eliminated many of the latency issues users face with other cloud solutions.

3. User-Friendly Design

We’ve built Gigapipe to be intuitive, with streamlined dashboards and easy-to-configure alerts. This means less time spent on setup and troubleshooting and more time focusing on your core business.

Comparing Pricing: Gigapipe vs. the alternatives

Let’s take a closer look at how Gigapipe compares to other platforms:

Grafana Cloud: While Grafana Cloud offers a free tier, its paid plans quickly become expensive as you scale. The pricing is based on ingestion rates, retention periods, and additional features like alerting and user management. Predicting your monthly bill can feel like solving a puzzle.
Datadog: Known for its robust features, Datadog’s pricing model is even more intricate. With separate charges for infrastructure monitoring, log management, APM, and more, businesses often find themselves paying far more than expected.
Gigapipe: In contrast, Gigapipe offers a single, predictable rate. No matter how much you scale, you’ll always know your costs upfront. This makes budgeting easier and removes the stress of surprise bills.
- €149/month: 32gb RAM, 8 vCPUs, 1TB data storage
- €249/month: 48GB RAM, 10 vCPUs, 2TB data storage
- …
- €1449/month: 192GB RAM, 48 vCPUs, 10TB data storage

To make the best direct comparison possible, we’ll use Vantage’s example for Logs. Now this example is quite heavily biased towards Grafana over Datadog already, making Grafana look infinitely more appealing by using this specific case where the indexing in Datadog puts its monthly rate over the $65k mark.

Their example also doesn’t account for queries and compute, which would increase the cost further especially if it’s a read-heavy setup. However here are their proposed costings for each (quoted from the article):

Pricing Scenario #3: Logs

25,000 GB of log data is ingested and stored for 1 month.

Grafana Cloud: 25,000 GB x $0.50 per GB = $12,500 total Grafana Cloud

Datadog:

25,000 GB x $0.50 per GB = $12,500 total Grafana Cloud

25,000 GB x $0.10 per GB = $2,500 for ingestion

25,000 GB / 1 KB (assuming average log event size is 1 KB) = 25 billion log events

25 billion log events x ($2.50 / 1 million log events) = $62,500 for indexing

$2,500 for ingestion + $62,500 for indexing = $65,000 total Datadog $2,500 for ingestion + $62,500 for indexing = $65,000 total Datadog

Gigapipe:

However, to put it into perspective for 25 billion logs equalling 25TB for the month, here’s what a sensible breakdown in Gigapipe would look like if we take into account the likely querying requirements of a dataset of this size:

16 servers (on our scale-up plan €249/month) = €3984/month
- Per machine:
  - 48GB RAM
  - 10 vCPUs
- Total storage: 32TB uncompressed

Total monthly cost under Gigapipe: €3984/month or $4101/month

Gigapipe aims to provide the tools and infrastructure growing companies require, in a clear and easily forecastable way. Infinite scalability with a clear cost vs performance structure that will REMAIN cost effective as client’s grow.

Why Simplicity Matters

Complex pricing models don’t just hurt your wallet, they also waste valuable time. Teams end up spending hours trying to optimize their usage to fit within budget constraints. At Gigapipe, we believe that observability should empower teams, not burden them. By keeping our pricing simple and our setup fast, we let you focus on what matters: building and maintaining great systems.

Conclusion

Cloud observability is essential, but it shouldn’t come with unnecessary complexity or unpredictable costs. Gigapipe offers a streamlined alternative that combines powerful features with straightforward pricing. If you’re tired of navigating convoluted pricing models and dealing with performance issues, it’s time to give Gigapipe a try.

Visit Gigapipe today to learn more and see how we can transform your observability experience.

🐤 Merging Parquet with chsql + duckdb

Lorenzo Mangani — Sun, 13 Oct 2024 22:00:20 GMT

If you’re familiar with our blog, you already know about our DuckDB Community Extension chsql providing a growing number of ClickHouse SQL macros and functions for DuckDB users. You can install chsql right from DuckDB SQL:

INSTALL chsql FROM community;
LOAD chsql;

Merging Parquet files with chsql mergetree

Today we’re introducing a new original function: read_parquet_mergetree

If you're a fan of ClickHouse's MergeTree engine and you're looking to supercharge your data workflow within DuckDB, the new read_parquet_mergetree function from our chsql extension is going to be your new best friend. It’s a memory-efficient, easy-to-use feature that lets you merge multiple Parquet files with sorting capabilities, emulating the best parts of ClickHouse's powerful data merging strategy.

What Does `read_parquet_mergetree` do?

This new function does exactly what it says on the tin: it reads and merges multiple Parquet files based on a user-specified primary sort key. Think of it as ClickHouse’s MergeTree engine for DuckDB, but tailored to handle massive datasets without hogging your system's memory.

The result? You get compact, sorted Parquet files that are ready for lightning-fast range queries.

Why should You care? 🚀

Here’s the TL;DR:

Efficient Merging: Combine data from multiple Parquet files just like how ClickHouse MergeTree tables consolidate data.
Sort and Compact: Set a primary sort key to organize your data, optimizing it for fast queries and analysis.
Memory Savvy: Perfect for large datasets where memory constraints matter.
Wildcard Support: Supports glob patterns and wildcards as read_parquet

How to Use It

Here's the beauty of read_parquet_mergetree—it fits seamlessly into your DuckDB workflow. The syntax is intuitive for ClickHouse users but also simple for anyone new:

COPY (SELECT * FROM read_parquet_mergetree(['/folder/*.parquet'], 'some_key')) 
TO 'sorted.parquet';

This command:

Reads all Parquet files in the folder (thanks to wildcard support),
Merges them based on the selected primary sort key (some_key),
Outputs the sorted and compacted result into a new Parquet file.

Real-World Benchmark: Memory Efficiency

To illustrate how much memory read_parquet_mergetree can save you, consider this benchmark:

COPY (SELECT * FROM read_parquet(['/folder/*.parquet']) ORDER BY some_key) 
TO 'sorted.parquet';

The read_parquet function uses all of our system’s RAM (64GB) to run the task.

Now, let’s compare that to read_parquet_mergetree:

COPY (SELECT * FROM read_parquet_mergetree(['/folder/*.parquet'], 'some_key')) 
TO 'sorted.parquet';

On the same system, the read_parquet_mergetree function query uses only ~800MB of RAM usage (~80x optimization), making it perfect for those working with large datasets on resource-constrained systems.

Used in combination with HTTPFS it can be used to merge remote files, too!

Final Thoughts

If you're working with large-scale data and need a tool that can merge and sort Parquet files efficiently, the read_parquet_mergetree function in the chsql extension is a game changer. Whether you're a ClickHouse user or a DuckDB enthusiast, this feature allows you to manage your data with unparalleled efficiency.

Try it out if you need to merge fast, compact, and sorted Parquet files! 🐤

Join our Community

Got ideas for the chsql extension? Join our team, let’s make this happen!

$0/month Observability with qryn

Alex Maitland — Wed, 02 Oct 2024 22:00:46 GMT

Self-Hosted Observability is not only viable - its the best way to get your stack under control without creating dependencies on cloud services or specific service providers and more importantly - without making your job harder in the process.

We run a cloud service and there's no shame in admitting the overhead cost for customers can become a deal breaker when size and scale grow larger than life.

If you’re trying to migrate off Grafana Cloud/AWS/Datadog/etc this is for you!

⭐ Self-Hosted + All-in-One

Here's our solid opensource recipe you can cook with three simple ingredients:

⭐ qryn: all-in-one polyglot stack

⭐ clickhouse: fast OLAP database

⭐ Opentelemetry: industry standard ingestion

This will get you covered for any Logs, Metrics, Traces and Profiles - at once.

⭐ Oracle Cloud: Always Free?

Hello, Oracle. Now, I'm not their biggest fan but respect where it's due:

The "Always-Free” Oracle Cloud tier generously allows running ARM Ampere A1 instances with 3,000 CPU hours and 18,000 GB hours per month, which affords exactly 4 CPUs and 24GB of RAM with a ~100GB of storage volume. For free. Forever.

Grab a FREE AMPERE ARM64 Instance

First, sign up to Oracle Cloud to gain access to their Always-Free Tier.

Browse to the Compute section and Create a New Instance
Modify the parameters in the Image and Shape section by clicking Edit
Choose an AMPERE shape, and max out the resources:

A few more clicks and you're ready to go. Refer to the Oracle docs for more options!

💡

Do NOT forget to download the generated keys for SSH access!

⭐ Prepare your ARM64 Instance

Once your Ampere instance is ready, install the latest Docker for arm64

$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh

⭐ Install qryn

The step is magic: use qryn transforms your instance into a polyglot stack!

Our stack natively supports arm64 architecture, so no special actions are needed.

Let’s deploy the full qryn bundle using docker compose on our brand new host:

$ git clone https://github.com/metrico/qryn-oss-demo
$ cd qryn-oss-demo

$ docker compose up -d

Wait for all services to start and you’re ready to go!

Grafana is included on port 3000 preconfigured with all the qryn datasources, demo dashboards and demo telemetry instantly ready to explore and play with:

⭐ Add Opentelemetry

It’s time to ingest your own data into the system using Opentelemetry.

Opentelemetry is the standard when it comes to observability instrumentation. Paired with qryn it allows methodic ingestion of any telemetry type (Logs, Metrics, Traces, Profiling) using common interfaces compatible with many vendors.

"OTEL" usage = no vendor or tech lock-ins as part of your deployments.

The qryn otel-collector allows ingesting massive amounts of data directly into the bundled ClickHouse instance using the native binary drivers and delivering incredible throughput and speed, easily extensible with object storage.

⭐ Service Ports

The qryn API supports ingestion of several protocols but for our high-performance setup we will leverage the qryn opentelemetry collector for writing into our database.

The following service ports are exposed by the default qryn collector config:

- "3200:3100" # Loki/Logql HTTP receiver
  
  "3201:3200" # Loki/Logql gRPC receiver
  
  "8088:8088" # Splunk HEC receiver
  
  "5514:5514" # Syslog TCP Rereceiverceiver
  
  "24224:24224" # Fluent Forward receiver
  
  "4317:4317" # OTLP gRPC receiver
  
  "4318:4318" # OTLP HTTP receiver
  
  "14250:14250" # Jaeger gRPC receiver
  
  "14268:14268" # Jaeger thrift HTTP receiver
  
  "9411:9411" # Zipkin Trace receiver
  
  "11800:11800" # Skywalking gRPC receiver
  
  "12800:12800" # Skywalking HTTP receiver
  
  "8086:8086" # InfluxDB Line proto HTTP
  
  "8062:8062" # Pyroscope jprof Receiver

⭐ Telemetry Agents

You’re ready to choose any of the supported agents and start sending data:

Vector (logs, metrics)
Grafana Agent (logs, metrics, traces)
Alloy (logs, metrics, traces, profiling)
Opentelemetry Collector (logs, metrics, traces, profiling)
Any Agent compatible with the qryn APIs (Loki, Prometheus, Tempo)

That’s it! Your free Polyglot Observability stack is ready to use and abuse!

⭐ Extending Capacity

If the ~100GB of onboard storage run out, attach S3 storage to your setup or take a look at our sponsors at gigapipe and their flat-price qryn observability SaaS

That's it. One API. One datastore. A thousand formats and use cases 🎉

Tigris S3 Storage + qryn

Alex Maitland — Mon, 30 Sep 2024 22:00:40 GMT

Meet Tigris

Tigris is a new distributed S3 compatible object storage operated by Fly.io and offering global bucket replication with low pricing and a generous free tier:

5GB of data storage per month
10,000 PUT, COPY, POST, LIST requests per month
100,000 GET, SELECT and all other requests per month

Example

Let's say you have a bucket with 100GB of data and you make 1,000,000 GET requests to the objects in the bucket. You would be charged as follows:

Data Storage: 5GB x $0 + 95GB x $0.02/GB/month = $1.90
PUT Requests: 10,000 x $0 + 90,000 x $0.005/1000 requests = $0.45
GET Requests: 100,000 x $0 + 900,000 x $0.0005/1000 requests = $0.45
Data Transfer: $0

There’s more! Storage costs are calculated using GB/month, determined by averaging the daily peak storage over a monthly period. For example:

Storing 1 GB constantly for a whole month = 1 GB/month
Storing 10 GB for 12 days + 20 GB for 18 days = 16 GB/month

🚀 Sounds interesting? Get ready! This example shows how to use Tigris buckets as cold storage disk with the ClickHouse S3 Table engine and qryn. Let’s do this.

Setup Instructions

Get Tigris

https://yourbucket.fly.storage.tigris.dev

Generate a token pair with write permissions to the bucket, ie:

Access Key ID = XXXXXXXXXXXXXXXXXXXXXXXX
Secret Access Key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

ClickHouse

Before we proceed, let’s validate our bucket and practice some simple queries.

Configure an S3 table in ClickHouse using Parquet format
Configure the S3 Engine with your Tigris bucket and tokens
Configure max_threads, max_insert_threads based on your CPU cores

CREATE TABLE s3_tigris (name String, value UInt32) 
   ENGINE=S3('https://yourbucket.fly.storage.tigris.dev/somefolder/sometable.csv', 'XXXXXXXXXXXXXXXXXXXXXXXX', 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY', 'Parquet') 
   SETTINGS max_threads=8, max_insert_threads=8, input_format_parallel_parsing=0, input_format_with_names_use_header=0;

INSERT & SELECT data using the Tigris storage table

INSERT INTO s3_tigris VALUES ('one', 1), ('two', 2), ('three', 3);

SELECT * FROM s3_tigris LIMIT 2;

Alrigh! If everything works as expected, we’re ready to steam right ahead.

Tigris Storage for qryn

Manual queries are fun - next let's configure Tigris as a ClickHouse storage disk for our qryn instance to store our Logs, Metrics, Traces and Profiling data.

Here’s an overly simple configuration using S3 as the only storage for our data.

Configure an S3 disk with data_cache_enabled
Configure a storage policy to automatically manage our cold storage
Configure data_cache_max_size based on your storage configuration
Configure move_factor based on the desired ratio

<yandex>
  <storage_configuration>
    <disks>
      <tigris>
        <type>s3type>
        <endpoint>https://yourbucket.fly.storage.tigris.dev/fakekeyendpoint>
        <access_key_id>XXXXXXXXXXXXXXXXXXXXXXXXaccess_key_id>
        <secret_access_key>YYYYYYYYYYYYYYYYYYYYsecret_access_key>
        <data_cache_enabled>1data_cache_enabled>
        <data_cache_max_size>8589934592data_cache_max_size>
      tigris>
    disks>
    <policies>
      <external>
        <volumes>
          <s3>
            <disk>tigrisdisk>
          s3>
        volumes>
      external>
      <tiered>
        <move_factor>0.05move_factor>
        <volumes>
          <hot>
            <disk>ssddisk>
          hot>
          <s3>
            <disk>tigrisdisk>
            <prefer_not_to_merge>trueprefer_not_to_merge>
          s3>
        volumes>
      tiered>
    policies>
  storage_configuration>
yandex>

Note: Performance may vary based on network conditions and available resources

🗨️ If you have feedback or use Tigris Buckets with ClickHouse and qryn, please consider sharing your test results with our community!

Reference Links

Interested in this subject? Check out the following links for further information

https://clickhouse.com/docs/en/engines/table-engines/integrations/s3/
https://altinity.com/blog/tips-for-high-performance-clickhouse-clusters-with-s3-object-storage
https://blog.qryn.dev/cloudflare-r2-clickhouse

🔎 Cloudflare Tail Workers + qryn

Jachen Duschletta — Thu, 26 Sep 2024 18:46:22 GMT

Supercharge Your Observability: Using Cloudflare Tail Workers with qryn

In the ever-evolving landscape of cloud computing and web services, observability has become a critical aspect of maintaining robust and efficient systems. Today, we're excited to explore a powerful combination: Cloudflare Tail Workers and qryn, the polyglot observability stack compatible with Loki, Prometheus, Tempo and Pyroscope. This integration allows you to stream logs and events from your Cloudflare Workers directly into your observability platform, providing real-time insights and enhancing your ability to monitor and troubleshoot your applications.

What are Cloudflare Tail Workers?

Cloudflare Tail Workers are a special type of Cloudflare Worker that allows you to process and forward logs and events from your other Workers in real-time. They act as a "tail" to your main Workers, catching and processing the output stream. This feature is incredibly useful for:

Real-time log analysis
Error tracking and alerting
Performance monitoring
Security event processing

Tail Workers receive batches of events from your main Workers, allowing you to process, filter, or forward these events to external systems – in our case, qryn.

Configuring Cloudflare Tail Workers

To set up a Tail Worker, follow these steps:

Log in to your Cloudflare dashboard.
Navigate to the Workers section.
Click "Create a Service" and choose "Tail Worker" as the type.
Give your Tail Worker a name and click "Create Service".
In the editor, paste the code for qryn ingestion (shown below).
Save and Deploy your Tail Worker.

Connecting Tail Workers to Main Workers

Add the following to the wrangler.toml file of your Main Worker(s):

tail_consumers = [{service = ""}]

Alternatively use the following procedure through the Cloudflare User-Interface:

Go to the main Worker you want to monitor.
In the Settings tab, find the "Tail Workers" section.
Select your newly created Tail Worker from the dropdown.
Save the changes.

Now, your Tail Worker will receive events from the main Worker.

The Code: Tail Worker for qryn

Here's our Tail Worker code that processes events and sends them to qryn.

// Function to create Loki POST query
function createLokiPostQuery(data) {
  const streams = data.map(item => {
    const labels = {
      scriptName: item.scriptName,
      outcome: item.outcome,
      url: item.event.request.url,
      method: item.event.request.method,
      colo: item.event.request.cf.colo
    };

    const labelString = Object.entries(labels)
      .map(([key, value]) => `${key}="${value}"`)
      .join(',');

    const entries = [];

    // Process logs
    item.logs.forEach(log => {
      entries.push({
        ts: log.timestamp.toString() + '000000', // Convert to nanoseconds
        line: JSON.stringify({
          level: log.level,
          message: log.message.join(' ')
        })
      });
    });

    // Process exceptions
    item.exceptions.forEach(exception => {
      entries.push({
        ts: exception.timestamp.toString() + '000000', // Convert to nanoseconds
        line: JSON.stringify({
          type: 'exception',
          name: exception.name,
          message: exception.message
        })
      });
    });

    // Process diagnosticsChannelEvents
    item.diagnosticsChannelEvents.forEach(event => {
      entries.push({
        ts: event.timestamp.toString() + '000000', // Convert to nanoseconds
        line: JSON.stringify({
          type: 'diagnosticsChannel',
          channel: event.channel,
          message: event.message
        })
      });
    });

    return {
      stream: {
        [labelString]: ''
      },
      values: entries.map(entry => [entry.ts, entry.line])
    };
  });

  return {
    streams: streams
  };
}

// Cloudflare Worker
export default {
  async tail(events) {
    // Process events using our createLokiPostQuery function
    const lokiPostQuery = createLokiPostQuery(events);

    // Grafana Loki API endpoint
    const lokiApiUrl = 'https://qryn.server/loki/api/v1/push';

    try {
      const response = await fetch(lokiApiUrl, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(lokiPostQuery),
      });

      if (!response.ok) {
        throw new Error(`HTTP error! status: ${response.status}`);
      }

      const result = await response.text();
      console.log('Successfully sent data to Loki:', result);
    } catch (error) {
      console.error('Error sending data to Loki:', error);
    }
  }
};

☝️ Update the lokiApiUrl parameter with your qryn or qryn.cloud URL endpoint ☝️

🔎 Ready to Tail

Your Workers tail should appear in your qryn instance. Select by using CloudFlare Workers Tail labels and search/filter/transform the logs using the Logs Explorer:

🔎 Tail Worker Labels:

scriptName, outcome, url, method, colo

🎱 Benefits of This Integration

By integrating Cloudflare Tail Workers with qryn, you instantly gain:

Real-time log streaming: Get instant visibility into your Workers' behavior.
Centralized observability: Collect logs from all your Workers in one place.
Advanced querying and visualization: Leverage qryn's powerful features and its native integration with Grafana to analyze your data.
Scalability: Handle high volumes of log data with ease.

Conclusion

The combination of Cloudflare Tail Workers and qryn offers a robust solution for real-time observability of your Cloudflare Workers. By following the steps outlined in this post, you can set up a powerful logging pipeline that will give you deeper insights into your applications' performance and behavior.

Remember, good observability practices are key to maintaining reliable and efficient systems. Start leveraging these tools today, and take your monitoring capabilities to the next level!

💠 Altinity Cloud Observability with qryn

Jachen Duschletta — Wed, 25 Sep 2024 11:05:55 GMT

Earlier this week our friends at Altinity released a guide on configuring Altinity Cloud stack observability using Grafana Cloud. Since qryn is a drop-in Grafana Cloud replacement (and so much more) with native ClickHouse storage, this is our redemption guide for Altinity Ops 😉

▶️ Altinity Cloud Observability with qryn

In this blog post, we’ll show you how to keep an eye on your Altinity-managed ClickHouse clusters using the qryn polyglot observability stack (or Gigapipe Cloud)

With the Altinity Cloud Manager, you can quickly configure your ClickHouse environment to send observability signals into qryn using Prometheus and Loki APIs and instantly use Grafana to create and share real-time visualizations and alerts.

We’ll walk you through the steps to:

Setup qryn or sign-up for Gigapipe Cloud
Send Prometheus metrics to qryn
Send ClickHouse Logs to qryn
Visualize and Explore data with Grafana and qryn

GOAL: Store qryn data in Altinity Cloud to kill Observability costs!

▶️ Get Polyglot

If you already have a qryn setup, you’re ready to go!

⚙️ If you’re using K8s: https://github.com/metrico/qryn-helm

⚙️ If you’d like a quick local setup use our qryn-demo bundle.

⚙️ If you’d like to keep things cloudy, signup for an Account on Gigapipe

▶️ Setup Prometheus + qryn

The qryn stack supports the Prometheus API out of the box, including remote_write capabilities. Using qryn and/or Gigapipe cloud, all you need is the qryn service URL and optional Authentication and Partitioning tokens:

From your Altinity Cloud Manager browse your list of environments, selecting the one you want to monitor and use the three-dot menu to edit the settings.
In the Environment Configuration dialog select the Metrics tab to configure
Once there, find the External Prometheus section and enter the qryn ingestions URL into the Remote URL field. When using Gigapipe, remember to add the API-Key as the Auth User and finally add the API-Secret as the Auth Password*.*
```
  https://qryn.local:3100/api/v1/prom/remote/write
  -- OR -- 
  https://qryn.gigapipe.com/api/v1/prom/remote/write
```
Click the OK button to save the changes. This will activate the required connections and initiate metrics generation towards your qryn instance

▶️ Setup Loki Logs + qryn

Let’s perform the same operation to emit our Logs using the qryn Loki API

From your Altinity Cloud Manager browse your list of environments, selecting the one you want to monitor and use the three-dot menu to edit the settings.
In the Environment Configuration dialog select the Metrics tab to configure

Once there, find the External Loki URL field, where you provide the qryn API URL

  https://qryn.local:3100/loki/api/v1/push
  -- OR --
  https://API-KEY:API-SECRET@qryn.gigapipe.com/loki/api/v1/push

Click the OK button to save the changes, which will activate the connection and send logs to your qryn or Gigapipe Cloud instance. We’re all set!

🔎 Exploring Metrics

Let’s open our qryn Grafana or Gigapipe Grafana to browse our datasources.

Navigate to the Explore tab and use the New Metric Exploration application to see an overview of all the received Altinity Cloud Metrics:

The labels will help you select the correct parts of your cluster and to generate queries to observe our stack from multiple angles. Each of these label based queries can be added into a dashboard or used for alerting, right from inside Grafana using the Select button to start. Each visualization is labeled with the name of the metric itself

Every metric sent to the Prometheus server has one or more labels attached to it. We can filter what we see by selecting one or more labels. Click the Add label button to add a label to the query. When you click the button, you’ll see a dropdown list of labels from all the metrics sent to this server:

🔎 Exploring Logs

Click the Label browser button to see the labels available in your qryn instance:

Let’s start with some well known labels: namespace and pod.

Browse and select the available values to see logs produced by them:

{namespace="altinity-maddie-na",pod="clickhouse-operator-7778d7cfb6-9q8ml"}

Click the Show Logs button to see all the matching messages from those services.

That’s it! You’re ready to use LogQL features to filter, extract and transform logs!

🎱 Advanced Techniques

Logs can be used to create ad hoc statistics about rates of errors or count of ‘query types’ over time, to allow alerting or monitoring of important health related status logs.

To make ad hoc statistics, we can use a query that filters the logs to something interesting (e.g. Logs that contain the word ‘Error’) and then use a counter query to see the amount of occurrences of the Error logs over time.

count_over_time({job="dummy-server"} |~ `Error` [$__auto])

This allows us to see how many Errors are occurring and if they increase over time. Using this query, we can also set an alert to detect if Errors pass a threshold.

To add an alert, we can navigate to the Alerting Menu inside Grafana and select Manage Alert Rules. Once there, we click New Alert rule to create a new rule.

Now we can use the above query to create an Alert for ‘Error’ log lines. Using this alert we can now get notified by Grafana when our cumulative ‘Error’ count goes above 100 in a 10 minutes span.

Through the power of qryn and Grafana, you can be alerted on Metric spikes, Log rates and Occurrences. Making it faster to find and resolve issues in your system.

👁️‍🗨️ Conclusion

If you use Clickhouse, use it all the way and store your observability in OLAP!

With qryn you get the same APIs and features as Grafana Cloud as a thin overlay on top of your existing ClickHouse storage, retaining control of costs, storage and without dependencies on third-party providers and their usage based plans…

Don’t drop your data to save. Drop your expensive Observability provider!

📈 Profiling² with Qryn Adventures pt. 1

Volodymyr Akchurin — Wed, 04 Sep 2024 09:15:41 GMT

Profiling performance will take you places - and at times - the improvement process can spin off into quite interesting journeys often leading to unexpected results. Performance tuning is one of significant parts for qryn development, and we are extremely proud of being able to do it using qryn itself!

💡

Check our Pyroscope Continuous Profiling announcement 🔥🔥🔥🔥

Today we'll share a small sample from our development experience in using PGO to optimize the qryn Opentelemetry collector performance. Through the article, we'll examine the effectiveness of PGO and analyze the results of our profiling exercise using qryn to explore the practical implications of this optimization technique.

Application Benchmark Setup

In order to perform our optimization and demonstrate the results, we need an application to optimize. The application should be under a constant load for a significant amount of time (the longer, the better; let's say at least 20 minutes).

We will use the Pyroscope client to collect the profile data and some readers to access the collected profiles.

We will run an Opentelemetry collector accepting application profiles through the Pyroscope-compatible API and saving them directly into the qryn database.

The running Opentelemetry collector features a Pyroscope extension that sends its own profiles to the accepting endpoint described earlier.

Thus, in this specific example, the collector will profiles itself 🔥

Apart from the Opentelemetry collector, we will leverage a qryn instance reading from the same database and a Grafana instance providing the user interface to read the saved profiles. Let's start our profiling exercise!

PGO optimization of the Collector

This example optimization process will be quite simple:

Wait for 10-20 minutes to collect a sufficient amount of pprof data.
Merge the collected pprof observations into one result pprof.
Add a -pgo pgo.pprof flag to the go build directive to build the optimized application.

Let's begin by analyzing the outcome of our first step:

The results are quite clear and easy to understand:

About 50% of the time was taken by the Go garbage collector (GC)
runtime.mca... is likely the HTTP server
(4) go.opente is the qryn exporter writing to the qryn database
github.com/ is the Pyroscope client itself

Next, let's try to combine the intermediate pprofs using the profilecli tool:

$ profilecli query merge \
  --from='2024-08-27T15:37:45+03:00' \
  --to='2024-08-27T16:00:30+03:00' \
  --url=http://localhost:3100 | head -n 10
level=info msg="query aggregated profile from profile store" url=http://localhost:3100 from=2024-08-27T15:37:45+03:00 to=2024-08-27T16:00:30+03:00 query={} type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
PeriodType: cpu nanoseconds
Period: 10000000
Time: 2024-08-27 15:37:55.600057479 +0300 EEST
Duration: 22m4
Samples:
samples/count cpu/nanoseconds
          4   40000000: 1 2 3 4 5 6 
          2   20000000: 7 8 9 10 11 12 13 
          2   20000000: 14 15 16 17 18 5 6 
          2   20000000: 19 20 21 22 23 24 25 26 27
$ profilecli query merge \
  --from='2024-08-27T15:37:45+03:00' \
  --to='2024-08-27T16:00:30+03:00' \
  --url=http://localhost:3100 \
  --output=pprof=./pgo.pprof
level=info msg="query aggregated profile from profile store" url=http://localhost:3100 from=2024-08-27T15:37:45+03:00 to=2024-08-27T16:00:30+03:00 query={} type=process_cpu:cpu:nanoseconds:cpu:nanoseconds
$ ls -la | grep pgo.pprof
-rw-r--r--   1 user user     30324 сер 29 20:42 pgo.pprof

💡

A fresh 30kB pprof file is generated by the above profilecli operation

Since we mostly use a dockerized opentelemetry collector to run our test we can simply add the -pgo flag to the Dockerfile and build it:

$ cat cmd/otel-collector/Dockerfile 
# Builder stage
FROM golang:1.22.1-alpine as builder
RUN apk --update add ca-certificates
# ...
RUN cd cmd/otel-collector && \
  go build -pgo=/src/pgo.pprof -tags timetzdata -o /out/otel-collector

# Final stage
FROM scratch

# ...
ENTRYPOINT ["/otel-collector"]
CMD ["--config", "/etc/otel/config.yaml"]

$ docker build \
  -t otel-collector:latest \
  -f cmd/otel-collector/Dockerfile .
[+] Building 61.5s (13/13) FINISHED                                                                                                                                                                                                      docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                                                               0.1s
 => => transferring dockerfile: 579B                                                                                                                                                                                                               0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 2)                                                                                                                                                                     0.1s
 => [internal] load metadata for docker.io/library/golang:1.22.1-alpine                                                                                                                                                                            1.6s
...
 => => naming to docker.io/library/otel-collector:latest                                                                                                                                                                                           0.0s

💡

Note: the full example Dockerfile can be found on our github repository

Let's repeat our test and wait for another 20 minutes and check the results...

Results of the Optimization

According to most blog articles, PGO typically yields a 2-7% improvement.

Initially, we want to debunk how multiple iterations could potentially improve the application's performance several times (spoiler alert: they do not)

To find out we performed two iterations of the process and compared results:

Both charts represent 20 minutes span of observation, with the application under constant load. The comparison between Base and First Round of optimization shows promising results: an improvement of (1-(4.66/4.94))*100=5.67%.

However, as suspected, the Second Round yielded less optimistic results:

🛑 The comparison of 4.94s vs 4.97s shows (1-(4.94/4.97))*100=0.6% downgrade

We can conclude that the second round performance is similar (if not slightly worse) to the baseline measurement. This easily demonstrates that there's definitely lots of value in PGO but also no such thing as infinite optimization and no speed of light :)

The most viable strategy for PGO usage:

Provide a benchmark of the unoptimized Go application.
Observe performance using the Pyroscope client and qryn.
Save the resulting pprof for PGO needs periodically.

Other notices during the comparison

Contrary to my expectations based on articles about PGO, which suggested it would "trim the noodles" on the flamechart by in-lining some functions into their callers, it sometimes does quite the opposite:

(left is baseline, right is optimized)

Next: let's take things a step further and examine the diff view supported in qryn since the latest 3.2.29 version.

Hey, did we mention how we're using ONLY qryn and NOT Pyroscope to run this test? 😏 We're so proud of our polyglot API supporting Continuous Profiling!

Pyroscope diff view

Well, obviously, the most improved parts are GC (Garbage Collection) and the runtime.mcall HTTP server. PGO definitely "trimmed some noodles" there (dark green color in the mid GC part). The rest of the chart needs further investigation.

The Pyroscope diff view is not very intuitive to read. The main downside I see is that they use percent comparison to represent color. Let's consider an example:

Suppose John, Mike, and I had 2 apples each. We had 6 apples overall, and each of us had 33% of all apples. Then one of my apples was "optimized" out. Now we have 5 apples. John and Mike have 40% each, and I have just 20%.

Have my apples been optimized? Yes, and my part will be green on the chart.
Have John's and Mike's apples been downgraded? No, they still have 2 apples each. But their parts will be red because their percent stake in increased by 10%.

With this in mind, we should read the diff view as follows:

green is definitely green
bright red is definitely bright red
faded red may be grey or green
grey may be green

During the comparison process, at some point, I came up with an average determiner of the profile:
/.

This would have the measurement of ns/sec and would be a more descriptive representation of the difference between profiles. However, it's quite complicated to calculate at this point.

During the comparison process at some point I came to the average determiner of the profile which is
/

That would have the measurement of ns/sec and will be a bit more descriptive representation of difference between profiles. But it's quite complicated to calculate at this point.

Another approach would be to select the same amount of time (as I did in the explorer before the diff view chapter) and compare the absolute values:

As you can see here, the absolute difference of the sample is 0ns. But the sample is red due to the 8% difference. It's also worth mentioning that this approach is quite challenging because selection with a mouse without a time picker is never precise.

Conclusion

The Profile-Guided Optimization (PGO) tool offers a relatively straightforward method to achieve some improvements in application efficiency. The experiments with the OpenTelemetry collector have shown several key insights:

Performance Gains: The initial round of PGO optimization resulted in a 5.67% improvement in performance. It is consistent within the average 2-7% range reported in literature.
Diminishing Returns: The second round of optimization showed no additional improvement. This suggests that a single, well-executed PGO pass may be sufficient for most applications.
Visualization Challenges: While the Pyroscope diff view in Grafana is valuable for analyzing optimizations, it needs some basic training to be read properly. This emphasizes the need for multiple analysis approaches, including absolute value comparisons over fixed time intervals.
Integration with CI/CD: To fully leverage PGO, it should be integrated into the continuous integration and deployment pipeline, with periodic re-optimization (e.g., every few weeks) to adapt to changing usage patterns.
Future of Profiling Tools: The limitations observed in current visualization tools, such as Grafana "Explore Profiles" plugin, point to opportunities for improvement in profiling and analysis tools, which could further enhance the effectiveness of techniques like PGO.

In conclusion, while PGO is a "low hanging fruit" to gain a few percentage points of performance and represents a valuable tool in the Go developer's toolkit. As profiling tools and visualization techniques continue to evolve, the process of applying and analyzing PGO optimizations is likely to become even more accessible and insightful for development teams.

And the best part - all you need is qryn 😏 the polyglot observability stack

Ready for Polyglot Observability?

Logs, Metrics, Traces and Profiles. Your data, your way.

You can run qryn self-hosted or managed by qryn.cloud

🐣 Introducing: chsql for DuckDB

Lorenzo Mangani — Thu, 11 Jul 2024 07:58:44 GMT

TLDR: DuckDB extension providing ClickHouse SQL Dialect Macros

Prequel

Our readers know this has been a ClickHouse centric blog for quite some time. Together we've documented our journey with qryn - the first polyglot observability stack build on top of Clickhouse - and on the side we built a few frankenstein serverless embedded Clickhouse series, later even joining efforts founding chdb - quickly killed by world class crooks.We also experimented lots with DuckDB

First Quack ⭐

Once the chdb adventure aborted we switched focus on Quackpipe, a fast and tiny serverless OLAP API powered by DuckDB and emulating the ClickHouse HTTP API with basic format compatibility and shipping with the same play interface, session persistence and authentication. Quackpipe is double-face and also works as a FIFO processor and ClickHouse UDFs to run DuckDB queries.

🔥 Curious? Try running a serverless query using our Fly.io public demo

Different SQL Strokes

Now Quackpipe speaks the DuckDB SQL language - which is amazing - but our audience are ClickHouse refugees who spent years mastering language conventions just like ourselves - and some of those functions are actually good and useful.

Our initial solution was loading a list of ClickHouse SQL aliases at startup time but this approach was slow, fragile and quite hard to maintain, update and embed.

-- Type conversion macros
CREATE OR REPLACE MACRO toString(expr) AS CAST(expr AS VARCHAR);
CREATE OR REPLACE MACRO toInt8(expr) AS CAST(expr AS INT8);
CREATE OR REPLACE MACRO toInt16(expr) AS CAST(expr AS INT16);
CREATE OR REPLACE MACRO toInt32(expr) AS CAST(expr AS INT32);
CREATE OR REPLACE MACRO toInt64(expr) AS CAST(expr AS INT64);
CREATE OR REPLACE MACRO toInt128(expr) AS CAST(expr AS INT128);
CREATE OR REPLACE MACRO toInt256(expr) AS CAST(expr AS HUGEINT);
-- Type conversion with default values
CREATE OR REPLACE MACRO toInt8OrZero(expr) AS COALESCE(TRY_CAST(expr AS INT8), 0);
CREATE OR REPLACE MACRO toInt16OrZero(expr) AS COALESCE(TRY_CAST(expr AS INT16), 0);
CREATE OR REPLACE MACRO toInt32OrZero(expr) AS COALESCE(TRY_CAST(expr AS INT32), 0);
CREATE OR REPLACE MACRO toInt64OrZero(expr) AS COALESCE(TRY_CAST(expr AS INT64), 0);
CREATE OR REPLACE MACRO toInt128OrZero(expr) AS COALESCE(TRY_CAST(expr AS INT128), 0);
CREATE OR REPLACE MACRO toInt256OrZero(expr) AS COALESCE(TRY_CAST(expr AS HUGEINT), 0);
CREATE OR REPLACE MACRO toInt8OrNull(expr) AS TRY_CAST(expr AS INT8);
CREATE OR REPLACE MACRO toInt16OrNull(expr) AS TRY_CAST(expr AS INT16);
CREATE OR REPLACE MACRO toInt32OrNull(expr) AS TRY_CAST(expr AS INT32);
CREATE OR REPLACE MACRO toInt64OrNull(expr) AS TRY_CAST(expr AS INT64);
CREATE OR REPLACE MACRO toInt128OrNull(expr) AS TRY_CAST(expr AS INT128);
CREATE OR REPLACE MACRO toInt256OrNull(expr) AS TRY_CAST(expr AS HUGEINT);
-- and so on and on....

Luckily something much better was at the horizon....

💡

Is this a dream? SQL Macros are what we need to start and we can add any missing features or format natively in C++ when we need to....

DuckDB Extensions 🦆

DuckDB has a flexible extension mechanism that allows for dynamically loading extensions on all supported architectures, extending DuckDB functionality by providing support for additional file formats, introducing new types, and domain-specific functionality. What would we do without JSON, HTTPFS, Arrow, etc?

And while DuckDB Labs could have selfishly kept extension super complex they decided to do the opposite and invited every developer to the party of the year!

DuckDB Community Extensions 🦆🦆🦆

TL;DR: DuckDB extensions can now be published via theDuckDB Community Extensions repository.The repository makes it easier for users to install extensions using theINSTALL ⟨extension name⟩ FROM community syntax. Extension developers avoid the burdens of compilation and distribution.

This it not marketing. The full ecosystem is ready to use with working examples, actions to build for all platforms and distribution for community extensions!

This is how it's done. Competition is none.

New plan: Quackpipe will use an extension to speak "ClickHouse" friendly.

👋 ClickHouse SQL Extension

Thanks to the fantastic examples a few hours later the chsql extension for DuckDB was already born implementing a small but growing number of native macros using ClickHouse SQL syntax transpiled to DuckDB SQL and Lambdas, making it easier to transition knowledge, users and scripts between the two database systems.

Publishing the extension was so easy it felt like cheating. It's quack magic.

Surprise, MotherDucker!

Community is where DuckDB really shines. After years of working with Vendors trying to kill any opensource Community effort this ecosystem feels different.

Respect to everyone at DuckDB and MotherDuck for doing such an amazing job.

UPDATE: We're excited to be part of the MotherDuck Startup Program*! 🦆*

Kudos: As soon as we started working on the extension and before we could even ask several Ducks flocked in and kindly helped us overcome a few initial issues, demonstrating how much they respect users and value developer interactions.**A++

For the first time after years of attempting to work with crippled UDFs in ClickHouse and dealing with the monster size of the project, we see some light ahead in OLAP.

So - Expect lots more DuckDB content on the blog as the adventure continues!

💡

Lesson Learned: Always prefer Motherduckers to Mother%uckers

⭐⭐⭐ SQL Hackers, Join us! ⭐⭐⭐

This is just the beginning of a long journey. To succeed we'll need your help.

If you're a ClickHouse or DuckDB SQL wizard*(or just have lots of SQL patience)* you can join the fun and contribute by adding, fixing or extending supported macros:

Find a ClickHouse function you are interested into from the functions list
Find any DuckDB functions offering viable methods to alias the target function
Create your macro and extend to neighboring functions with similar scope.
Test and Submit your contribution. We'll do the coding if needed.

🔺eBPF Observability with beyla + qryn

Lorenzo Mangani — Wed, 10 Jul 2024 18:59:10 GMT

Grafana Beyla is a new vendor agnostic, eBPF-based, OpenTelemetry/Prometheus application auto-instrumentation tool, which lets you easily get started with Application Observability. Within Beyla eBPF is used to automatically inspect application executables and the OS networking layer, allowing us to capture essential application observability events for HTTP/S and gRPC services.

From captured eBPF events, Beyla produces Opentelemetry trace spans and Rate-Errors-Duration (RED) metrics, without any modifications to your application code or configuration and of course - compatible with qryn for ingestion and usage.

Using Beyla

Beyla supports a wide range of programming languages (Go, Java, .NET, NodeJS, Python, Ruby, Rust, etc.) and can be used in parallel with other existing signals.

To get started, refer to the auto-instrumentation QuickStart in Grafana Beyla.

Sending data from Beyla to qryn

Beyla was designed for Grafana LGTM which means its drop-in compatible with qryn when it comes to shipping traces, metrics and even logs using native API formats, thanks to the polyglot all-in-one API approach implemented by the qryn stack.

And since qryn supports direct ingestion of OTEL traces, configuring Beyla with the qryn OTEL endpoint in Docker is as easy as it gets! Just remember to configure the process PID, BEYLA_OPEN_PORT & BEYLA_SERVICE_NAME parameters to match your target application process (or docker container) before running:

docker run --rm --pid="container:clickhouse-server" 
-e BEYLA_OPEN_PORT=8123-e BEYLA_SERVICE_NAME="clickhouse" 
-e OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" 
-e OTEL_EXPORTER_OTLP_ENDPOINT="http://qryn:3100"  
--privileged grafana/beyla:latest

That's all it takes! We're ready to launch our Beyla instance and validate!

A few seconds later your process tracing will be visible in qryn/tempo by searching for the BEYLA_SERVICE_NAME configured in the previous step:

That's just the beginning of the journey and there's so much more to discover and use when it comes to eBPF and qryn combined. Post your comments and feedback!

🐙 Using Grafana alloy with qryn

Lorenzo Mangani — Wed, 10 Apr 2024 21:22:49 GMT

Grafana just announced Alloy a new flexible, high performance, vendor-neutral distribution of the Opentelemetry (OTel) Collector. Since Alloy is compatible with the most popular open source observability standards such as Opentelemetry, Prometheus and Loki - it is also fully compatible our of the box with qryn!

That's our polyglot magic making Observability less of a pain for everyone!

Alloy is a vendor-neutral telemetry collector. This means that Alloy doesn’t enforce a specific deployment topology but can work in multiple scenarios acting as a metrics scraper, logs scraper, Opentelemetry receiver and more - with dynamic routing.

💦 The Alloy Pipeline

Just like most collectors, Alloy offers its functionality in different stages:

1) Collect

Alloy uses more than 120 components to collect telemetry data from applications, databases, and Opentelemetry collectors. Alloy supports collection using multiple ecosystems, including Opentelemetry and Prometheus. Telemetry data can be either pushed to Alloy, or Alloy can pull it (scrape) from your data sources.

2) Transform

Alloy processes data and transforms it for sending. You can use transformations to inject extra metadata into telemetry or filter out unwanted data.

3) Write

Alloy sends data to Opentelemetry-compatible databases or collectors such as Grafana LGTM or qryn. Alloy can also write alerting rules in compatible databases.

⚙️ Installation

Installi Alloy on your system following the official documentation

Run a Linux Docker container

To run Alloy examples in Docker use the following command in a terminal:

docker run \
  -v :/etc/alloy/config.alloy \
  -p 12345:12345 \
  grafana/alloy:latest \
    run --server.http.listen-addr=0.0.0.0:12345 --storage.path=/var/lib/alloy/data \
    /etc/alloy/config.alloy

Replace the following:

: The path of the configuration file on your host system.

⚠️ Data collection ⚠️

By default, Grafana Alloy sends anonymous (but uniquely identifiable) usage information from your Alloy instance to Grafana Labs. Use the -disable-reporting command line flag to disable the reporting and opt-out from this annoyance.

⚙️ Alloy Examples for qryn

Here are some generic observability examples combining Alloy and qryn:

🔶 Prometheus Scraper

Scrape Prometheus Metrics and forward them to qryn for ingestion

prometheus.scrape "prometheus" {
  targets = [{
    __address__ = "localhost:12345",
  }]
  forward_to     = [prometheus.remote_write.default.receiver]
  job_name       = "prometheus"
  scrape_timeout = "45s"
}

prometheus.remote_write "default" {
  endpoint {
    name = "qryn"
    url  = "https://qryn:3100/api/prom/push"

    basic_auth {
      username = "USERNAME"
      password = "PASSWORD"
    }

    queue_config {
      capacity             = 2500
      max_shards           = 200
      max_samples_per_send = 500
    }

    metadata_config {
      max_samples_per_send = 500
    }
  }
}

🔶 Log Scraper

Scrape Logs from any system and forward them to qryn for ingestion

local.file_match "example" {
    path_targets = [{
        __address__ = "localhost",
        __path__    = "/var/log/*.log",
    }]
}

loki.source.file "example" {
    targets    = local.file_match.example.targets
    forward_to = [loki.write.default.receiver]
}

loki.write "default" {
    endpoint {
        url = "http://qryn:3100/loki/api/v1/push"
    }
    external_labels = {}
}

🔶 OTel Collector for qryn

Collect and Forward Opentelemetry protocols to qryn

otelcol.receiver.otlp "default" {
    grpc { }

    http { }

    output {
        metrics = [otelcol.processor.memory_limiter.default.input]
        logs    = [otelcol.processor.memory_limiter.default.input]
        traces  = [otelcol.processor.memory_limiter.default.input]
    }
}

otelcol.processor.memory_limiter "default" {
    check_interval   = "1s"
    limit_percentage = 90

    output {
        metrics = [otelcol.exporter.otlp.default.input]
        logs    = [otelcol.exporter.otlp.default.input]
        traces  = [otelcol.exporter.otlp.default.input]
    }
}

otelcol.exporter.otlp "default" {
    client {
        endpoint = "qryn:3100"
    }
}

🔶 Tempo Collector with Service Graph

otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4320"
  }

  output {
    traces  = [otelcol.connector.servicegraph.default.input,otelcol.exporter.otlp.qryn_tempo.input]
  }
}

otelcol.connector.servicegraph "default" {
  dimensions = ["http.method"]
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
  }
}

otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.qryn.receiver]
}

prometheus.remote_write "qryn" {
  endpoint {
    url = "https://qryn:3100/api/prom/push"

    basic_auth {
      username = env("QRYN_USERNAME")
      password = env("QRYN_PASSWORD")
    }
  }
}

otelcol.exporter.otlp "qryn_tempo" {
  client {
    endpoint = "https://qryn:3100"
    auth     = otelcol.auth.basic.qryn_tempo.handler
  }
}

otelcol.auth.basic "qryn_tempo" {
  username = env("QRYN_USERNAME")
  password = env("QRYN_PASSWORD")
}

🔶 K8s Operator for qryn

// read the credentials secret for remote_write authorization
remote.kubernetes.secret "credentials" {
  namespace = "monitoring"
  name = "primary-credentials-metrics"
}

prometheus.remote_write "primary" {
    endpoint {
        url = "https://qryn:3100/api/v1/push"
        basic_auth {
            username = nonsensitive(remote.kubernetes.secret.credentials.data["username"])
            password = remote.kubernetes.secret.credentials.data["password"]
        }
    }
}

prometheus.operator.podmonitors "primary" {
    forward_to = [prometheus.remote_write.primary.receiver]
    // leave out selector to find all podmonitors in the entire cluster
    selector {
        match_labels = {instance = "primary"}
    }
}

prometheus.operator.servicemonitors "primary" {
    forward_to = [prometheus.remote_write.primary.receiver]
    // leave out selector to find all servicemonitors in the entire cluster
    selector {
        match_labels = {instance = "primary"}
    }
}

🔶 K8s Pods Scraper for qryn

// Get our API key from disk.
//
// This component has an exported field called "content", holding the content
// of the file.
//
// local.file.api_key will watch the file and update its exports any time the
// file changes.
local.file "api_key" {
  filename  = "/var/data/secrets/api-key"

  // Mark this file as sensitive to prevent its value from being shown in the
  // UI.
  is_secret = true
}

// Create a prometheus.remote_write component, which other components can send
// metrics to.
//
// This component exports a "receiver" value, which can be used by other
// components to send metrics.
prometheus.remote_write "prod" {
  endpoint {
    url = "https://qryn:3100/api/v1/write"

    basic_auth {
      username = "admin"

      // Use the password file to authenticate with the production database.
      password = local.file.api_key.content
    }
  }
}

// Find Kubernetes pods where we can collect metrics.
//
// This component exports a "targets" value, which contains the list of
// discovered pods.
discovery.kubernetes "pods" {
  role = "pod"
}

// Collect metrics from Kubernetes pods and send them to prod.
prometheus.scrape "default" {
  targets    = discovery.kubernetes.pods.targets
  forward_to = [prometheus.remote_write.prod.receiver]
}

🔶 Loki + Prometheus forwarder to qryn

prometheus.scrape "metrics_test_local_agent" {
    targets = [{
        __address__ = "127.0.0.1:12345",
        cluster     = "localhost",
    }]
    forward_to      = [prometheus.remote_write.metrics_test.receiver]
    job_name        = "local-agent"
    scrape_interval = "15s"
}

prometheus.remote_write "metrics_test" {
    endpoint {
        name = "qryn"
        url  = "https://qryn:3100/api/prom/push"

        basic_auth {
            username = ""
            password = ""
        }

        queue_config { }

        metadata_config { }
    }
}

local.file_match "logs_varlogs_varlogs" {
    path_targets = [{
        __address__ = "localhost",
        __path__    = "/var/log/*.log",
        host        = "mylocalhost",
        job         = "varlogs",
    }]
}

loki.process "logs_varlogs_varlogs" {
    forward_to = [loki.write.logs_varlogs.receiver]

    stage.match {
        selector = "{filename=\"/var/log/*.log\"}"

        stage.drop {
            expression = "^[^0-9]{4}"
        }

        stage.regex {
            expression = "^(?P\\d{4}/\\d{2}/\\d{2} \\d{2}:\\d{2}:\\d{2}) \\[(?P[[:alpha:]]+)\\] (?:\\d+)\\#(?:\\d+): \\*(?:\\d+) (?P.+)$"
        }

        stage.pack {
            labels           = ["level"]
            ingest_timestamp = false
        }
    }
}

loki.source.file "logs_varlogs_varlogs" {
    targets    = local.file_match.logs_varlogs_varlogs.targets
    forward_to = [loki.process.logs_varlogs_varlogs.receiver]

    file_watch {
        min_poll_frequency = "1s"
        max_poll_frequency = "5s"
    }
}

loki.write "logs_varlogs" {
    endpoint {
        url = "https://qryn:3100/loki/api/v1/push"
    }
    external_labels = {}
}

There's so much more you can do with alloy and qryn combined integration. Let us know your ideas, feedback and experience through our Github repository.

🔶 Get Polyglot

qryn is the observability system you've been waiting for - its free and opensource

Logs, Metrics, Traces, Continuous Profiling. All of the power, none of the stress!

🦗 Odigos + qryn = zero instrumentation

Lorenzo Mangani — Tue, 27 Feb 2024 23:00:00 GMT

Detect and Fix Production Issues Faster with Odigos & qryn

Odigos is designed to automatically instruments get distributed traces, metrics and logs for any Kubernetes application in minutes, without any code changes.

Odigos detects the programming language of your applications and applies automatic instrumentation using well-known, battle-tested open source observability technologies such as OpenTelemetry and eBPF.

Tutorial

In this tutorial we are going to use Odigos for getting automatic observability of a microservices application written in Go, Java, Python, .NET and Node.

Odigos v0.1.36+ natively supports qryn as destination for traces, logs and metrics.

📚 This guide is adapted from the odigos documentation examples

Prerequisites

To follow the guide, you need the following:

A Kubernetes cluster.
Helm CLI for installing helm charts.
A qryn or qryn.cloud deployment.

Creating the Kubernetes cluster

Create a new local Kubernetes cluster, by running the following command:

kind create cluster

Deploying the target application

For this tutorial, we are going to install a fork of microservices-demo. We use a modified version without any instrumentation code to demonstrate how Odigos automatically collects observability data from the application.

Deploy the demo application using the following command:

kubectl apply -f https://raw.githubusercontent.com/keyval-dev/microservices-demo/master/release/kubernetes-manifests.yaml

Before proceeding, make sure that all the application pods are running.

Installing Odigos

The easiest way to install Odigos is to use the official helm chart:

helm repo add odigos https://keyval-dev.github.io/odigos-charts/
helm install my-odigos odigos/odigos --namespace odigos-system --create-namespace

After all the pods in the odigos-system namespace are running, open the Odigos UI by running the following command and navigate to [http://localhost:3000]

kubectl port-forward svc/odigos-ui 3000:3000 -n odigos-system

Choosing where to send the data

You should now see the following page:

After Odigos detected all the applications in the cluster, choose the opt out option for application instrumentation. opt in mode is recommended when you want to have greater control over which applications are instrumented.

On the next page, select qryn as the destination for the data:

Fill in the following information using your relevant qryn details:

Tempo URL: https://qryn.host/tempo/api/push
Prometheus URL: https://qryn.host/api/prom/remote/write
Loki URL: https://qryn.host/loki/api/v1/push

Generating data

That’s it! Odigos will automatically do the following:

Instrument all the applications in the cluster:
Runtime languages will be instrumented using OpenTelemetry.
Compiled languages will be instrumented using eBPF.
Deploy and configure a collector to send the data to qryn.

Now all that is left is to generate some traffic in the application.

Execute the following command to port forward into the application UI:

kubectl port-forward svc/frontend 1234:80 -n default

Navigate to [http://localhost:1234] and perform fake some purchases.

Exploring the collected data

Within minutes, you should see distributed traces appear in qryn. You now have all the data needed to understand how your application is behaving, without having to do any additional work. Using this configuration any new application deployed to this Kubernetes cluster will automatically be instrumented and sent to qryn.

Cleanup

Delete the Kubernetes cluster by running the following command:

kind delete cluster

Conclusion

Odigos is pretty amazing at generating distributed traces, metrics and logs for any K8s application in minutes, and qryn supports it as transparently as it should acting as a polyglot backend for all datatypes, without wasting your time with complexity.

Kudos to team keyval for this fantastic project. Check out Odigos Cloud, too!

Have fun instrumenting your real Applications, and please share your comments!

🌥 qryn cloud

Try this example and many more from the comfort of your screen using qryn cloud

🦟 Fly.io Observability with qryn

Lorenzo Mangani — Sat, 20 Jan 2024 11:18:28 GMT

Fly is a platform for running full stack apps and databases close to your users. We’ve been hammering on this thing since 2017, and we think it’s pretty great.

Lots of smart people love Fly.io and run their apps on this great platform!

Each Fly Apps will produce logs, helpful if not essential for a variety of use cases - debugging, tracking, collating, correlating, coalescing, and condensing the happenings of your running code into useful bits of human-parsable information.

But since Fly.io doesn’t keep your logs around forever to make the best of them we can store and explore data using services such as qryn and qryn.cloud, without loosing any of the Grafana compatibility and Dashboards Fly.io offers by default.

How can we do that? Luckily for us fly.io is always super elegant and very generously routes all Firecracker VM logs through NATS, ready to be picked up and processed.

💡

👊 Thanks to our good friend Dan The Goodman from Tangia for suggesting this smart approach and for inspiring this guide! Checkout his projects!

🎈 Fly higher with qryn

To get logs all you need is an app that acts as a NATS client, reads the logs, and ships them somewhere. Vector can do just that and since it supports qryn natively through the Loki and Prometheus sinks, it works out of the box with our example.

Using the Log Shipper

For our example we'll use use the fly-log-shipper example, extended for qryn.

The fly NATS log stream is scoped to your user organization, which means the Fly Log Shipper collects logs from all your applications at once. Very practical.

Here's a quick example using a custom qryn sink for shipping logs and metrics:

# Make a directory for qryn log shipper
mkdir logshippper
cd logshippper

# I chose not to deploy yet
fly launch --image qxip/fly-log-shipper:latest

# Set secrets to enable qryn integration
fly secrets set ORG=personal
fly secrets set ACCESS_TOKEN=$(fly auth token)
fly secrets set QRYN_URL=
fly secrets set QRYN_USERNAME=
fly secrets set QRYN_PASSWORD=

Before launching your application, you should edit the generated fly.toml file and delete the entire [[services]] section. Replace it with this:

[[services]]
  http_checks = []
  internal_port = 8686

Once ready deploy the qryn log shipper application to your stack:

fly deploy

You’ll soon start to see logs appear from all of your apps:

NOTE: shipping logs to qryn does not interrupt the native logging feature!

What's happening under the hood?

Vector receives all the organization logs through NATS
NATS sourced logs are parsed and labeled for the Loki sink
Vector ships parsed logs to a Loki sink pointed at qryn or qryn.cloud

Fly Metrics

Fly.io produces granular metrics for all running applications using an internal shared Prometheus and Grafana service. Those seems to be long-retention but you might still want to pull them into your own stack for further processing and analysis.

To scrape those metrics into our custom solution we can use an extension to the existing Vector configuration to scrape all our organization timeseries using the same Fly.io access token used for our NATS connections, query matching all apps.

[sources.flyio_metrics]
  type = "prometheus_scrape"
  endpoints = [ "https://api.fly.io/prometheus/personal/federate" ]
  auth.strategy = 'bearer'
  auth.token = "${ACCESS_TOKEN?}"
  query.match = ['{app=~".+"}']

The resulting metrics are shipped to qryn using a standard remote_write API. Easy!

To reproduce the full experience, export & recycle Fly's Grafana dashboards.

OLAP-in-One

That's it! You're now ready to query your logs using Loki/LogQL and your metrics using Prometheus/PromQL and ready to receive Traces and Continuous Profiling events from within your App code using Tempo and Pyroscope compatibility.

Using the same process you can ship logs, metrics or traces from any stack.

Get Polyglot

Tired of unnecessary complexity? Join our polyglot observability journey today and regain control of your data with qryn oss or qryn.cloud for all of your Logs, Metrics, Traces and Profiling while saving a ton in cost and resources and without forcing your teams to learn new query languages and without having to change your tools.

Which stack would you like to see next? Let us know in the comments.

🛶 Replacing Pyroscope with qryn + otel

Lorenzo Mangani — Wed, 17 Jan 2024 10:16:21 GMT

If you work with observability and continuous profiling you surely have already heard of, use and/or have tried Grafana Pyroscope

Grafana Pyroscope is an open source software project for aggregating continuous profiling data. Continuous profiling is an observability signal that allows you to understand your workload’s resources (CPU, memory, etc.) usage down to the line number.

Grafana acquired Pyroscope and merged it with its Phlare continuous profiling database back in 2023. The smooth integration of Grafana Pyroscope with Grafana is one of its the main solution benefits allowing easy correlation of continuous profiling data with other observability signals, such as metrics, logs, and traces.

This is fantastic until we get reminded Grafana products force their users to maintain completely different backends and datastores for their "correlated" products, pushing lots of complexity for opensource integrations and tedious management. So if you need logs, metrics, traces and profiling - those are 4 parallel deployments.

This is where qryn kicks in! One powerful stack, fluent in all languages.

Another Pyroscope

Our observability stack qryn already offers native compatibility with Grafana Loki, Mimir/Prometheus and Tempo as part of its Polyglot API on top of OLAP databases and we're happy to announce Pyroscope API support has joined the family!

Piece by piece, we integrated all the parts and protocols to make the integration work in qryn, with the generous cooperation of community resident Tomer Shafir ⭐

Profiling data now lives within the same OLAP database as logs, metrics and traces, providing integrators with full access and control for predictable cost/performance.

That's the polyglot approach of qryn. More features. Less services. One datastore.

Get Started

Users and integrators of qryn 3.0 can already ingest continuous profiling data in their favourite OLAP database by using the qryn opentelemetry collector integration for java clients (soon to be extended to support a broader scope of pprof clients) and query profiling data in Grafana through the Pyroscope Datasource using qryn's APIs.

The search and filtering experience are fully compatible the original Pyroscope API, and can be instantly used without any additional knowledge, plugins or hacks

UPDATE 08/2024

qryn is fully compatible with the Pyroscope Explore App in Grafana!

With the opensource front being already on the way, the integration with qryn.cloud is up next with focus on total integration with all available gigapipe products, as well as any external customer resource through collectors and agents. Watch this space!

Updated Workflow

Here's the upcoming qryn workflow updated with Profiling Features. To the future!

🔎 React App observability with otel + qryn

Lorenzo Mangani — Mon, 04 Dec 2023 17:52:37 GMT

React - also referred to as React.js or ReactJS - is a JavaScript library for creating user interfaces through UI components. This React library is open-source and free to use, maintained by Meta and groups of independent developers and businesses. As per any webapplication running remotely, troubleshooting React can be challenging.

Opentelemetry is a library ecosystem that implements instrumentation for common libraries and frameworks, providing automatic instrumentation for components that generate end-to-end telemetry data without requiring major code changes.

React and Opentelemetry both provide client side coverage. A trace collector is required to receive and index traces. Our polyglot stack qryn is designed to be 100% compatible with Opentelemetry standards and supports ingestion via Otel Collectors or directly through the built-in OTEL ingestion API, with no additional middleware.

"Webservability" is what happens when you mix these amazing technologies!

Opentelemetry + React

Opentelemetry web libraries can be used for instrumenting and tracing React applications, enabling developers to identify and resolve performance issues and bugs, track user requests originating from frontend to backend services and back.

Requirements

Let's install the necessary Opentelemetry libraries as first step:

bashCopy codenpm install @opentelemetry/api @opentelemetry/sdk-trace-web @opentelemetry/exporter-trace-otlp-http @opentelemetry/auto-instrumentations-web

⚛️ Manual Instrumentation

After installing the libraries, the next step is to configure our Opentelemetry exporter. Here's a basic setup for @opentelemetry/exporter-trace-otlp-http:

javascriptCopy codeimport { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
// Initialize OTLP Exporter
const otlpExporter = new OTLPTraceExporter({
  url: 'https://qryn:3100/v1/traces'
});
// Initialize the Tracer Provider
const provider = new WebTracerProvider();
// Add OTLP Exporter to the provider
provider.addSpanProcessor(new SimpleSpanProcessor(otlpExporter));
// Register the provider
provider.register();

You can now create custom spans in your components to trace specific operations or user interactions by wrapping the spans around your code of interest.

javascriptCopy codeimport { trace } from '@opentelemetry/api';
const tracer = trace.getTracer('your-tracer-name');
const MyComponent = () => {
  const span = tracer.startSpan('MyComponentRender');
  /* 
     Perform some operations 
  */
  span.end();
  return <div>My React Componentdiv>;
};

The configured OTLP exporter will send the collected traces to the specified OTEL API endpoint - in our case either qryn or qryn.cloud (https://qryn:3100/v1/traces)

⚛️ Auto-Instrumentation

Auto-instrumentation in the context of a React application using Opentelemetry primarily revolves around automatically capturing relevant telemetry data like user interactions, component render times, and API requests. This can be particularly useful as it minimizes the manual instrumentation code you need to write and maintain. Here's an example of how you can set up auto-instrumentation.

To begin, you need to install the @opentelemetry/auto-instrumentations-web package, which provides auto-instrumentation capabilities for web applications.

npm install @opentelemetry/auto-instrumentations-web

After installing the package, you can configure your Opentelemetry setup to automatically instrument your React application. Here's an example setup:

javascriptCopy codeimport { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { DocumentLoadInstrumentation, UserInteractionInstrumentation } from '@opentelemetry/auto-instrumentations-web';

// Initialize OTLP Exporter
const otlpExporter = new OTLPTraceExporter({
  url: 'https://qryn:3100/v1/traces'
});
// Initialize the Tracer Provider
const provider = new WebTracerProvider();
// Add OTLP Exporter to the provider
provider.addSpanProcessor(new SimpleSpanProcessor(otlpExporter));
// Register the provider
provider.register();
// Auto-instrumentations
registerInstrumentations({
  instrumentations: [
    new DocumentLoadInstrumentation(),
    new UserInteractionInstrumentation(),
  ],
  tracerProvider: provider,
});

When using auto-instrumentations your React application will automatically generate and export traces related to document loads and user interactions, providing valuable insights on your application internals with minimal manual setup.

Happy Conclusion

Opentelemetry in React combined with the polyglot features of qryn or qryn.cloud delivers one of the fastest and lightest end-to-end observability pairs ever!

qryn supports Opentelemetry ingestion natively and through a collectors reducing project requirements and moving parts to a minimum - with maximum results.

That's it! Give it a try in our code and share your feedback and suggestions!

📞 WebRTC Observability with qryn

Lorenzo Mangani — Mon, 04 Dec 2023 01:41:51 GMT

In this interesting medium article, RTC veteran Vittorio Palmisano shows his audience how to simplify the debugging process of WebRTC applications by exporting all RTCPeerConnections metrics generated by Browser sessions out to a Prometheus + PushGateway service using a custom built browser extension.

If you have Prometheus and PushGateway up, that's great. If you don't.....

This is exactly the challenge we designed qryn for - so you know what's next!

Requirements

In order to run this experiment you will need the following:

qryn or qryn.cloud setup to ingest and query metrics
grafana to display metrics using the included dashboard
chrome browser with our custom extension installed

Browser Extension Exporter

For this demo, we'll keep things simple. The original article requires a ~~PushGateway~~ to deliver the Prometheus metrics. The InfluxDB line protocol is a valid alternative candidate for the job, allowing direct delivery to a collector such as qryn without extra components, so we modified the extension to add this protocol method.

👉 Download and manually install our extension from our GitHub repository

Once installed, you need to visit the extension options page, filling the qryn endpoint and optional username/password. Other optional settings include:

Update interval: it allows to change the metrics collection interval (in seconds).
Enabling gzip compression: it is not required but recommended in order to reduce the amount of data sent over the network;
Job name: it allows to customize the job label attached to each metric. Usually, you can query the exported label in Grafana using the exported_job label selector.
Enabled PeerConnection stats: you can specify here the list metrics types that you want to collect from the getStats output.
Enabled URL origins: you can see here the list of allowed URLs where the extension will actually collect data.

For a local qryn setup, you will only need the URL parameter: http://qryn:3100

Save your settings and proceed ahead.

Talk to Yourself

Real WebRTC developer and testers always prefer talking to themselves.

To join the club, open your Chrome browser and start the Janus Echo Test

Enable the WebRTC Internals Exporter for the Janus website 👇👇👇

Click start to initiate your WebRTC echo session and begin producing statistics

💡

Keep your echo test going for a few minutes to collect relevant data

📈 Real Time Analytics

If you followed each step and bad luck didn't get in the way, our metrics should be inserted into qryn and available through the Prometheus API. All of our data points and tags are instantly available on the imported preset WebRTC Dashboard 👈

The extension will convert each numeric value found in the RTCPeerConnections stats using a Gauge metric type, using the remaining string properties as labels. E.g. the bytesSent value in the outbound-rtp stats will be converted to a metric named outbound_rtp_bytesSent using the remaining string properties as labels.

Just imagine the combination of these reports with serverside Janus Events.....

Conclusion

That's it! With the power of opensource you have added a powerful asset to your real-time communications troubleshooting toolbox, while enjoying the polyglot capabilities of qryn alongside other RTC integrations such as homer and hepic.

Let's get polyglot 💛

Make a Scene... with Grafana 10 + qryn

Lorenzo Mangani — Sun, 03 Dec 2023 18:01:48 GMT

Grafana Scenes is a new front-end library that enables developers to create dashboard-like experiences — such as querying and transformations, dynamic panel rendering, and time ranges — directly within Grafana application plugins.

Scenes are collections of objects representing different UI components, such as data, time ranges, variables, layout, and visualizations. Let's see how we can use them to build a Log-centric Scene using a qryn Loki compatible data source.

Let's make a Scene

Let's begin our journey by exploring the Apps item in the left-side menu.

The boilerplate Scenes app comes with three routes: Page with tabs, Page with drilldown, and Hello world. We’re going to focus on the Home route located in src/pages/Home.tsx and rendered as the default page of our new app plugin.

The most important objects in a Grafana Scene are:

SceneApp: Responsible for top-level pages routing.
scene.Component: Used in render functions to render your SceneApp.

Let's begin by customizing the title and subtitle of our new Log Scenes App

We will include a verification for the existence of a qryn Loki data source and make sure to show a notification if it is missing. This will help avoid unforeseen errors and confirm that our setup is prepared for developing our new application.

import { DataSourceInstanceSettings } from '@grafana/data';

function hasLoggingDataSources(dataSources: Record) {
     return Object.entries(dataSources).some(([_, ds]) => ds.type === 'loki');
}

Once done, proceed updating your HomePage component.

export const HomePage = () => {
    const scene = useMemo(() => getScene(), []);
    const hasDataSources = useMemo(() => hasLoggingDataSources(config.datasources), []);

    return (
        <>
            {!hasDataSources && (
            <Alert title={`Missing logging data sources`}>
                    This plugin requires a Loki data source. Please add and configure a Loki data source to your Grafana instance.
            Alert>
            )}      
            <scene.Component model={scene} />
        
    );
};

If your local environment is configured correctly, you should not see any alerts.

Scenes for a Loki app

In this step, our focus will be on the file located in src/pages/Home/scenes.tsx.

In this file, we export the getBasicScene() function which will be used in the SceneAppPage on the Home page. To start, we need to consider the importance of time. Every request we make requires time, and for this application, we need a specific time interval to retrieve the data stored in qryn within that range.

The SceneTimeRange component manages the selection of time for query requests, and it is used in the SceneTimePicker control. In our example, we will create an instance that represents the time between now and one hour before. However, this range can be customized according to your needs. You don't have to worry about the fixed value, as we will allow the user to customize their selection.

import { SceneTimeRange } from '@grafana/scenes';

const timeRange = new SceneTimeRange({
    from: 'now-1h',
    to: 'now',
  });

User input is our next goal. To achieve this, we will utilize various controls that generate variables. Each variable will have a specific name and value. The DataSourceVariable enables the user to choose a data source from the ones configured in this Grafana instance. Once a data source is selected, a variable is generated. This variable possesses a designated name and can be utilized in queries and other components.

import { DataSourceVariable } from '@grafana/scenes';

const dsHandler = new DataSourceVariable({
    label: 'Data source',
    name: 'ds', // being $ds the name of the variable holding UID value of the selected data source
    pluginId: 'loki'
  });

QueryVariable enables you to showcase the outcomes of a query-based collection of values, like metric names or server names, in a dropdown menu. In this instance, we will request our qryn datasource to return the names of the stream selectors.

import { QueryVariable } from '@grafana/scenes';

const streamHandler = new QueryVariable({
    label: 'Source stream',
    name: 'stream_name', // $stream_name will hold the selected stream
    datasource: {
          type: 'loki',
          uid: '$ds' // here the value of $ds selected in the DataSourceVariable will be interpolated.
    },
    query: 'label_names()',
  });

TextBoxVariable is an input to enter free text. We will use it to select the value of the selected stream.

import { TextBoxVariable } from '@grafana/scenes';
const streamValueHandler = new TextBoxVariable({
    label: 'Stream value',
    name: 'stream_value', // $stream_value will hold the user input
  });

We currently have time and user input stored in our variables. Moving forward, we will utilize both the time and user input to construct queries. This is where the scene component comes into play. The SceneQueryRunner will retrieve data from the qryn Loki data source and deliver the results to a visualization or multiple visualizations.

Each query is represented as a JSON object, containing a reference ID (refid) and an expression that specifies the query to be executed.

import { SceneQueryRunner } from '@grafana/scenes';

const queryRunner = new SceneQueryRunner({
    datasource: {
          type: 'loki',
          uid: '$ds' // here the value of $ds selected in the DataSourceVariable will be interpolated.
    },
    queries: [
        {
            refId: 'A',
            expr: 'your query here',
        },
    ],
});

Visualizing Data

The PanelBuilders API provides support for building visualization objects for the supported visualization types, such as Stat, TimeSeries, and Logs.

1. Stat panel

We will begin by creating a stat visualization. Stats display a single prominent value along with an optional graph sparkline. You have the ability to customize the background or value color using thresholds or overrides. To achieve this, we will utilize a QueryRunner and a PanelBuilder.

To gather the necessary data, we will employ a qryn metric query to visualize the rate at which these logs occur within the specified time frame.

import { SceneQueryRunner, PanelBuilders } from '@grafana/scenes';
import { BigValueGraphMode } from '@grafana/schema';

const statQueryRunner = new SceneQueryRunner({
    datasource: {
        type: 'loki',
        uid: '$ds'
    },
    queries: [
        {
            refId: 'A',
            expr: 'sum(rate({$stream_name="$stream_value"} [$__auto]))',
        },
    ],
});

const statPanel = PanelBuilders.stat()
    .setTitle('Logs rate / second')
    .setData(statQueryRunner)
    .setOption('graphMode', BigValueGraphMode.None)
    .setOption('reduceOptions', {
         values: false,
         calcs: ['mean'],
         fields: '',
    });

Like we mentioned earlier, variables are going to be replaced with user input:

sum(rate({$stream_name="$stream_value"} [$__interval]))

Here's our Panel's logic explained as a sequence:

Request a Stat from PanelBuilder.
Provide a title.
Tell it where to get the data.
Tell it that we don’t want to see the sparklines, just the big number.
Provide some customizations around how to treat the data.

In order to display a single numerical value on the stat panel, we need to customize it to calculate the mean of the provided values. As metric queries return time series, we can use this customization to see the average of all values in this particular example.

2. Time series

The second visualization we’re going to use is a TimeSeries panel, because we want to see how data changes over a period of time.

import { SceneQueryRunner, PanelBuilders } from '@grafana/scenes';

const timeSeriesQueryRunner = new SceneQueryRunner({
    datasource: {
        type: 'loki',
        uid: '$ds',
    },
    queries: [
        {
            refId: 'B',
            expr: 'count_over_time({$stream_name="$stream_value"} [$__auto])',
        },
    ],
});

  const timeSeriesPanel = PanelBuilders
    .timeseries()
    .setTitle('Logs over time')
    .setData(timeSeriesQueryRunner);

This one is simpler, and works out of the box without customizations.

3. Logs panel

For our logs panel we will use a qryn log query + a log visualization widget.

import { SceneQueryRunner, PanelBuilders } from '@grafana/scenes';

const logsQueryRunner = new SceneQueryRunner({
    datasource: {
        type: 'loki',
        uid: '$ds',
    },
    queries: [
        {
            refId: 'A',
            expr: '{$stream_name="$stream_value"}',
            maxLines: 20, // Use up to 5000
        },
    ],
});

  const logsPanel = PanelBuilders.logs()
    .setTitle('Logs')
    .setData(logsQueryRunner);

Arrange the scenes

To organize the scenes, we will follow these steps:

First, we have gathered all the necessary building blocks for our application. Now, in the final stage, we need to pass these objects to the Scenes library and determine how they should be organized in the UI. To achieve this, we will utilize a grid layout.

This layout is the default behavior of dashboards in Grafana, and it provides a similar experience for our scenes. Within the grid layout, we will incorporate three SceneGridItem components. In order to pass the variables to our scene, we will encapsulate them within a SceneVariableSet.

Lastly, we will arrange the scene in a way that grants the user access to certain controls. These controls include:

VariableValueSelectors to modify the variable controls.
SceneControlsSpacer to add a little bit of air between objects.
SceneTimePicker to customize the time selection.

export function getBasicScene() {
    // Everything before goes over here
    return new EmbeddedScene({
        $timeRange: timeRange,
        $variables: new SceneVariableSet({
            variables: [dsHandler, streamHandler, streamValueHandler],
        }),
        body: new SceneGridLayout({
            children: [
                new SceneGridItem({
                    height: 8,
                    width: 8,
                    x: 0,
                    y: 0,
                    body: statPanel.build(),
                }),
                new SceneGridItem({
                    height: 8,
                    width: 16,
                    x: 8,
                    y: 0,
                    body: timeSeriesPanel.build(),
                }),
                new SceneGridItem({
                    height: 8,
                    width: 24,
                    x: 0,
                    y: 4,
                    body: logsPanel.build(),
                })
            ],
        }),
        controls: [
            new VariableValueSelectors({}),
            new SceneControlsSpacer(),
            new SceneTimePicker({ isOnCanvas: true }),
        ],
    });
}

If you followed our tutorial, the results should look similar to the following example:

💡

don’t forget to enter a value for the stream value input for the qryn source

Conclusion

Now we know how to develop a Grafana plugin and how to use the Scenes library to constructing interactive interfaces with visualizations using qryn's Loki and LogQL compatible APIs. This experiment further confirms our vision: being polyglot rocks!

That's right! Since qryn is a drop-in compatible LGTM alternative, any guide or tutorial designed and created for Grafana Loki, Mimir or Tempo will work out of the box when used with qryn and qryn.cloud using our custom made APIs built on top of fast and scalable OLAP databases such as ClickHouse, DuckDB and InfluxDB.

Ready to try? Deploy qryn OSS locally or try our qryn.cloud managed service 👇

LLMs Observability with Traceloop + qryn

Lorenzo Mangani — Thu, 23 Nov 2023 13:58:41 GMT

OpenLLMetry is a set of extensions built on top of OpenTelemetry that gives you complete observability over your LLM application with minimal complexity.

Because it uses OpenTelemetry under the hood it can be connected to existing observability solutions such as our polyglot stackqryn and qryn.cloud ⭐⭐⭐

Step 1: Traceloop SDK Setup

OpenLLMetry lets you easily trace prompts and embedding calls of OpenAI and can provide a complete view of your OpenAI application using traces and spans.

To get started, Install the Traceloop SDK and initialize it within your code.

🧠 OpenAI Example

Automatically log all calls to OpenAI, with prompts and completions

import openai
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow

Traceloop.init(app_name="joke_generation_service")

@workflow(name="joke_creation")
def create_joke():
    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
    )

    return completion.choices[0].message.content

🦙 LLAMA Example

Automatically log all calls to LLAMA models, with prompts and completions

import chromadb
import os
import openai

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import HuggingFaceEmbedding
from traceloop.sdk import Traceloop

openai.api_key = os.environ["OPENAI_API_KEY"]

# Initialize Traceloop
Traceloop.init()

chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("quickstart")

# define embedding function
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# load documents
documents = SimpleDirectoryReader("./data/my_docs/").load_data()

# set up ChromaVectorStore and load in data
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
)

# Query Data
query_engine = index.as_query_engine()
response = query_engine.query("Summarize the documents in context")

For more information refer to the Traceloop SDK Documentation

Step 2: Grafana Agent Sender

Configure a Grafana Agent instance to feed Traceloop traces into qryn / qryn.cloud

traces:
  configs:
    - name: default
      remote_write:
        - endpoint:  qryn.cloud endpoint>:443
          basic_auth
            username:  qryn X-API-Key>
            password:  qryn X-API-Secret>
      receivers:
        otlp:
          protocols:
            grpc:

/* Environment Variable for your local app with Traceloop */
TRACELOOP_BASE_URL=http://:4318

That's it! You're now ready to explore your LLMs activity using qryn

You can immediately get started with some popular examples:

👉 Trace prompts and completions

Call OpenAI and see prompts, completions, and token usage for your call.

👉 Trace your RAG retrieval pipeline

Build a RAG pipeline with Chroma and OpenAI. See vectors returned from Chroma, full prompt in OpenAI and responses

Are you Ready?

Signup for a free account on qryn.cloud or install our oss stack on-premise ⭐

Embedded OLAP Benchmarks

Lorenzo Mangani — Mon, 13 Nov 2023 07:00:12 GMT

Benchmarks for embedded SQL databases are rarely impartial and they tend to use larger-than-life configurations by each vendor in their race for the market peak. The typical megalomaniac benches ($$$) are usually both scientifically and technically accurate but they rarely translate into anything useful for us regular end-users deploying their databases on the lowest tier options and using cheap resources.

With this in mind, we decided to assemble a simple and "fair" benchmarking tool

The rules for our little game are simple:

No reliance on dedicated CPUs and fast RAM. Cheap stuff only.
Focus on real network operations with remote parquet files.
Mimicking real-life scenarios and queries (as a query collection)

GitHub Actions

GitHub Actions runners are a great example of "cheap resources" (as in free) offering us 2vCPUs and 7GBs of RAM for each execution. We decided to use actions to benchmark our target databases equally sharing the pros and cons uniformly.

Python

Python will be our playground since most embedded OLAP databases support it.

Our runner will orderly spawn the same SQL tests against each of the contending databases in separate parallel sessions operating with equal CPU/RAM/Network resources and remote parquet files with local tests for balance. No caching allowed.

A few hours later our first action-based benchmark playground was ready to run with tests covering our first group of embedded SQL OLAP engines:

⏩ If you want to fast-forward to the code and real action reports:

GitHub Repo: https://github.com/lmangani/embedded-db-benchmarks

Test groups are executed on demand with a customizable number of iterations:

Query Results

For each run and each database runner, separate query statistics are collected.

Here are some sample speed results for chdb (0.15.0) and duckdb (0.9.1):

Testing chdb (0.15.0)
chdb:version: avg=0.018s min=0.017s max=0.020s (3 runs)
chdb:count: avg=0.312s min=0.210s max=0.502s (3 runs)
chdb:groupby: avg=0.772s min=0.742s max=0.824s (3 runs)
chdb:groupby-local: avg=0.436s min=0.432s max=0.441s (3 runs)

Testing duckdb (0.9.1)
duckdb:version: avg=0.000s min=0.000s max=0.001s (3 runs)
duckdb:count: avg=0.358s min=0.120s max=0.823s (3 runs)
duckdb:groupby: avg=0.778s min=0.769s max=0.793s (3 runs)
duckdb:groupby-local: avg=0.498s min=0.494s max=0.505s (3 runs)

Each group runs all tests in parallel, making it easy to compare changes over time.

For instance, here's the latest chdb (0.16.0rc2) gaining speed on parquet counts 🔥

Testing chdb (0.16.0rc2)
chdb:version: avg=0.011s min=0.011s max=0.012s (5 runs)
chdb:count: avg=0.160s min=0.082s max=0.386s (5 runs) 🔥
chdb:groupby: avg=0.445s min=0.407s max=0.496s (5 runs) 🔥
chdb:groupby-local: avg=0.338s min=0.325s max=0.344s (5 runs) 🔥

Testing duckdb (0.9.1)
duckdb:version: avg=0.000s min=0.000s max=0.000s (5 runs) 🔥
duckdb:count: avg=0.259s min=0.098s max=0.894s (5 runs)
duckdb:groupby: avg=0.571s min=0.566s max=0.576s (5 runs)
duckdb:groupby-local: avg=0.341s min=0.334s max=0.351s (5 runs)

💡

CPU, Memory and Network utilization are also collected during tests

With each test, there's a pinch of luck involved. This is why multiple run are needed.

⚠️ Remember our bench does not want to be authoritative - it wants to be realistic!

Full Reports

The benchmarking actions tracks the system resource utilization during tests.

For instance, we can see how chdb peaks its CPU utilization at 23% while duckdb reaches 36% within the same timeframe of execution. Pretty interesting 🔥

First Conclusions

Here's one of the latest runs: chdb, duckdb followed by glaredb compete for the top positions while databend and datafusion results in the slowest and more memory hungry. Network fluctuations and latency issues play a role - just like in real life!

Do you think our tests are penalizing any engine? Shall we run multiple rounds and aggregate results? Please submit a PR or open an issue and we'll try anything!

Testing chdb 0.16.0rc2 (23.10.1.1)
chdb:version: avg=0.012s min=0.011s max=0.014s (3 runs) | Memory used: 2.47 MB
chdb:count: avg=0.135s min=0.064s max=0.264s (3 runs) | Memory used: 3.91 MB
chdb:groupby: avg=0.435s min=0.407s max=0.478s (3 runs) | Memory used: 25.98 MB

Testing duckdb 0.9.1
duckdb:version: avg=0.001s min=0.000s max=0.001s (3 runs) | Memory used: 2.96 MB
duckdb:count: avg=0.360s min=0.083s max=0.900s (3 runs) | Memory used: 26.02 MB
duckdb:groupby: avg=0.697s min=0.685s max=0.715s (3 runs) | Memory used: 25.86 MB

Testing glaredb 0.5.1
glaredb:version: avg=0.001s min=0.000s max=0.001s (3 runs) | Memory used: 11.38 MB
glaredb:count: avg=0.157s min=0.071s max=0.307s (3 runs) | Memory used: 9.00 MB
glaredb:groupby: avg=0.489s min=0.482s max=0.496s (3 runs) | Memory used: 200.90 MB

Testing databend 1.2.207
databend:version: avg=0.013s min=0.001s max=0.038s (3 runs) | Memory used: 3.50 MB
databend:count: avg=0.237s min=0.216s max=0.277s (3 runs) | Memory used: 7.50 MB
databend:groupby: avg=1.629s min=1.580s max=1.674s (3 runs) | Memory used: 462.03 MB

Testing datafusion 32.0.0
datafusion:version: avg=0.016s min=0.001s max=0.045s (3 runs) | Memory used: 3.62 MB
datafusion:count: avg=0.243s min=0.179s max=0.338s (3 runs) | Memory used: 7.12 MB
datafusion:groupby: avg=1.860s min=1.820s max=1.920s (3 runs) | Memory used: 474.79 MB

CloudBench

This way of testing is showing potential, but why should we reinvent the wheel? Due to popular demand, we're porting our action-based benchmarks to ClickBench

💡

Resilience to network fluctuations & latency is an integral part of the test

Observability

The reports are still quite generic and provide useful insight into test execution directly from our actions, without any cost or any external service required. But to determine anything scientific we need to collect thousands of executions over time.

To track and analyze the performance of each test over time we can configure our Reports to be shipped as complete logs, metrics and traces directly into qryn 👇👇

OpenAI Observability

Lorenzo Mangani — Fri, 03 Nov 2023 10:23:04 GMT

If you are building Applications with OpenAI's API this post is for you!

Introduction

In an era where AI and machine learning are at the forefront of technological advancements, services like OpenAI and ChatGPT have gained immense popularity for their ability to transform industries, streamline customer interactions, and automate various processes.

However, with great power comes great responsibility, and monitoring and observability are vital aspects of ensuring the smooth operation and optimal utilization of such AI services, and avoiding unpleasant surprises (=costs)

In this quick article, we will explore how the qryn.cloud polyglot observability stack can collect Metrics and Logs to provide valuable insights into the performance, behavior, and usage patterns of OpenAI and ChatGPT, enabling organizations to harness the full platform potential while ensuring reliability and efficiency.

Benefits of Monitoring OpenAI ✨

Here are some good reasons to Monitor OpenAI - as explained by ChatGPT:

Performance Optimization: Effective monitoring allows organizations to keep a close eye on the performance of OpenAI and ChatGPT. By collecting and analyzing metrics, businesses can identify bottlenecks, latency issues, and areas where optimization is needed. This, in turn, leads to improved response times and enhanced user experiences.
Cost Management: Running AI models like ChatGPT can be resource-intensive. Through comprehensive monitoring, you can gain insights into usage patterns and cost trends. This data enables informed decisions about resource allocation, helping organizations optimize their budget and prevent unexpected overages.
User Experience Enhancement: Understanding how users interact with ChatGPT and OpenAI is crucial for delivering a seamless user experience. Observing user behavior and analyzing logs can uncover pain points, frequently asked questions, and other valuable insights to tailor AI responses and services to better meet user needs.
Security and Compliance: Security is a top concern when dealing with sensitive data or information. Effective monitoring helps detect and prevent security breaches, unauthorized access, and potential vulnerabilities. It also aids in ensuring compliance with data protection regulations and industry standards.
Predictive Maintenance: Proactive monitoring can help identify issues before they become critical. By setting up alerts for anomalies and unusual behavior, organizations can implement preventive measures, reducing downtime and the risk of service disruptions.
Scaling and Resource Allocation: As demand for AI services fluctuates, organizations need to be agile in scaling resources. Monitoring helps in understanding usage patterns and trends, enabling efficient resource allocation and scaling to meet demand, whether for peak hours or seasonal changes.
Customization and Improvement: Observability data can provide insights into how users are interacting with AI models. This information can be used to refine and customize AI responses, improving the quality of interactions and ultimately enhancing user satisfaction.

Requirements

For this experiment we can use any Linux system with Python3.x.

Before starting, install the grafana-openai-monitoring dependency using pip

pip install grafana-openai-monitoring

We will need a couple of tokens to configure our monitoring script:

OpenAI API Key
Scoped Token for qryn.cloud (not needed for qryn oss)

It's Monitoring Time 🔭

To monitor Chat completions using the OpenAI API, you can use the chat_v2.monitor decorator. This decorator automatically tracks API calls and sends metrics and logs to the qryn or qryn.cloud endpoints.

import openai
from grafana_openai_monitoring import chat_v2

# Set your OpenAI API key
openai.api_key = "YOUR_OPEN_AI_API_KEY"

# Apply the custom decorator to the OpenAI API function
openai.ChatCompletion.create = chat_v2.monitor(
  openai.ChatCompletion.create,
  metrics_url="https://qryn.gigapipe.com/api/v1/prom/remote/write",
  logs_url="https://qryn.gigapipe.com/loki/api/v1/push",
  metrics_username="X-API-Key",
  logs_username="X-API-Key",
  access_token="X-API-Secret"
  )

# Now any call to openai.ChatCompletion.create will be automatically tracked
response = openai.ChatCompletion.create(model="gpt-4", max_tokens=100, messages=[{"role": "user", "content": "What is Observability?"}])

print(response)

To monitor Completions using the OpenAI API, you can use the chat_v1.monitor decorator. This decorator adds monitoring capabilities to the OpenAI API function and sends metrics and logs to the qryn or qryn.cloud endpoints.

import openai
from grafana_openai_monitoring import chat_v1

# Set your OpenAI API key
openai.api_key = "YOUR_OPEN_AI_API_KEY"

# Apply the custom decorator to the OpenAI API function
openai.Completion.create = chat_v1.monitor(
  openai.Completion.create,
  metrics_url="https://qryn.gigapipe.com/api/v1/prom/remote/write",
  logs_url="https://qryn.gigapipe.com/loki/api/v1/push",
  metrics_username="X-API-Key",
  logs_username="X-API-Key",
  access_token="X-API-Secret"
  )

# Now any call to openai.Completion.create will be automatically tracked
response = openai.Completion.create(model="davinci", max_tokens=100, prompt="What is Observability?")

print(response)

After configuring the parameters, the monitored API function will automatically log and track the requests and responses to the specified endpoints.

Grafana Dashboard 🛸

Once our data is ingested in qryn or qryn.cloud, we can download and import the OpenAI Dashboard in our connected Grafana instance to display metrics and logs.

Potential Unlocked 🔥

Monitoring OpenAI and ChatGPT with the qryn.cloud polyglot observability stack is not just about tracking metrics and logs; it's about unlocking the full potential of AI services while ensuring reliability, security, and cost-effectiveness. With the right observability tools and practices in place, organizations can harness the power of AI to its fullest, staying competitive in a rapidly evolving technological landscape.

Are you Ready?

Signup for a free account on qryn.cloud or install our oss stack on-premise ⭐

qryn: polyglot monitoring and observability

Tail Sampling with Otel + Gigapipe

What is Tail Sampling?

Leveraging qryn as an OpenTelemetry Receiver

Configuring Tail Sampling with qryn

Example OpenTelemetry Collector Configuration:

Benefits of Tail Sampling

It’s that simple.

The Hidden Costs of Cloud Observability: Why Gigapipe Stands Out

The Challenges of Cloud Observability

The Gigapipe Difference

Comparing Pricing: Gigapipe vs. the alternatives

Pricing Scenario #3: Logs

Why Simplicity Matters

Conclusion

🐤 Merging Parquet with chsql + duckdb

Merging Parquet files with chsql mergetree

What Does read_parquet_mergetree do?

Why should You care? 🚀

How to Use It

Real-World Benchmark: Memory Efficiency

Final Thoughts

Join our Community

$0/month Observability with qryn

⭐ Self-Hosted + All-in-One

⭐ Oracle Cloud: Always Free?

Grab a FREE AMPERE ARM64 Instance

⭐ Prepare your ARM64 Instance

⭐ Install qryn

⭐ Add Opentelemetry

⭐ Service Ports

⭐ Telemetry Agents

⭐ Extending Capacity

Tigris S3 Storage + qryn

Meet Tigris

Example

Setup Instructions

Get Tigris

ClickHouse

Tigris Storage for qryn

Reference Links

🔎 Cloudflare Tail Workers + qryn

Supercharge Your Observability: Using Cloudflare Tail Workers with qryn

What are Cloudflare Tail Workers?

Configuring Cloudflare Tail Workers

Connecting Tail Workers to Main Workers

The Code: Tail Worker for qryn

🔎 Ready to Tail

🔎 Tail Worker Labels:

🎱 Benefits of This Integration

Conclusion

💠 Altinity Cloud Observability with qryn

▶️ Altinity Cloud Observability with qryn

▶️ Get Polyglot

▶️ Setup Prometheus + qryn

▶️ Setup Loki Logs + qryn

🔎 Exploring Metrics

🔎 Exploring Logs

🎱 Advanced Techniques

👁️‍🗨️ Conclusion

📈 Profiling² with Qryn Adventures pt. 1

Application Benchmark Setup

PGO optimization of the Collector

Results of the Optimization

Other notices during the comparison

Pyroscope diff view

Conclusion

Ready for Polyglot Observability?

🐣 Introducing: chsql for DuckDB

Prequel

First Quack ⭐

Different SQL Strokes

DuckDB Extensions 🦆

DuckDB Community Extensions 🦆🦆🦆

👋 ClickHouse SQL Extension

Surprise, MotherDucker!

⭐⭐⭐ SQL Hackers, Join us! ⭐⭐⭐

🔺eBPF Observability with beyla + qryn

Using Beyla

Sending data from Beyla to qryn

What Does `read_parquet_mergetree` do?