- Home
- About Pixie
- Installing Pixie
- Using Pixie
- Tutorials
- Reference
Pixie can deploy bpftrace programs to your cluster, collect the resulting data, and display it in the Live UI. This tutorial will demonstrate how to run a bpftrace
program using a PxL script and discuss the guidelines for running arbitrary bpftrace code using Pixie.
Most of the data in Pixie's no-instrumentation monitoring platform is collected by the Pixie Edge Modules (PEMs), which are deployed as a daemonset onto every node in your cluster. These PEMs use eBPF based tracing to collect network transactions without any code changes.
One increasingly popular way to write eBPF programs is to use bpftrace, an open source high-level tracing language for Linux. bpftrace provides a simplified front-end language that makes it easier to write BPF programs when compared to frameworks such as BCC
. Many bpftrace programs are written as one-liners or stand-alone scripts.
Now, using Pixie, developers can dynamically run their own bpftrace programs on their cluster. Pixie will handle:
printf
statements into a table.bpftrace programs output data through a variety of built-in functions. Examples include printf
for general purpose printing, print
for printing map contents, and time
for printing the current time. bpftrace also automatically prints all maps on termination, which many bpftrace programs rely on.
Pixie's distributed bpftrace deployment feature captures outputs made through bpftrace printf
statements, and pushes the arguments into an automatically created table, as shown below.
There are some requirements for bpftrace programs you wish to deploy with Pixie, all of which concern the output mechanism:
printf
statement.printf
statement, the format string of all printfs
must be exactly the same, as it defines the table output columns.printf
statements in the BEGIN
or END
blocks.name:%d
). The column names cannot contain any whitespaces.time_
and pass the argument nsecs
.Note that not all programs in the bpftrace
repository meet these requirements, but most can be easily adapted to be compatible. For example, in programs with multiple printfs, the extraneous printfs can be removed. Also, programs that output data on termination instead of through printf
statements can be converted to instead print the data on a regular interval using an interval
block. pidpersec.bt
is a good example of this design pattern.
This beta feature has limitations:
In this demo, we'll deploy Dale Hamel's bpftrace TCP retransmit tool using Pixie. TCP retransmits are usually a sign of poor network health and this open-source tool will help us discover if any connections in our cluster are experiencing a high number of retransmits.
We've incorporated this trace into a PxL script called px/tcp_retransmits
. To run this script:
px/tcp_retransmits
script using the drop down script
menu or with Pixie Command. Pixie Command can be opened with the ctrl/cmd+k
keyboard shortcut.ctrl/cmd+enter
keyboard shortcut.Once the probe is deployed to all the nodes in the cluster, the probes will begin to push out data into tables. The PxL script queries this data and the Vis Spec defines how this data will be displayed.
In the Live View, we'll see a graph of the pods (hexagonal grey box icons) and the services (hexagonal grey tree icons) who are are experiencing TCP retransmits.
The color and weight of the arrows between these entities indicates the number of retransmits. Hovering over an arrow will display the number of retransmits for a particular connection. The data displayed in this graph can also be seen in the Data Drawer (use the ctrl/cmd+d
keyboard shortcut to open and close this table).
In this particular example, the 3 pods experiencing high levels of retransmits are located on the same node, perhaps indicating an issue with that particular node.
1# Copyright (c) Pixie Labs, Inc.2# Licensed under the Apache License, Version 2.0 (the "License")34import pxtrace5import px67# Adapted from https://github.com/iovisor/bpftrace/blob/master/tools/tcpretrans.bt8program = """9// tcpretrans.bt Trace or count TCP retransmits10// For Linux, uses bpftrace and eBPF.11//12// Copyright (c) 2018 Dale Hamel.13// Licensed under the Apache License, Version 2.0 (the "License")1415#include <linux/socket.h>16#include <net/sock.h>1718kprobe:tcp_retransmit_skb19{20 $sk = (struct sock *)arg0;21 $inet_family = $sk->__sk_common.skc_family;22 $AF_INET = (uint16) 2;23 $AF_INET6 = (uint16) 10;24 if ($inet_family == $AF_INET || $inet_family == $AF_INET6) {25 if ($inet_family == $AF_INET) {26 $daddr = ntop($sk->__sk_common.skc_daddr);27 $saddr = ntop($sk->__sk_common.skc_rcv_saddr);28 } else {29 $daddr = ntop($sk->__sk_common.skc_v6_daddr.in6_u.u6_addr8);30 $saddr = ntop($sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr8);31 }32 $sport = $sk->__sk_common.skc_num;33 $dport = $sk->__sk_common.skc_dport;34 // Destination port is big endian, it must be flipped35 $dport = ($dport >> 8) | (($dport << 8) & 0x00FF00);3637 printf(\"time_:%llu src_ip:%s src_port:%d dst_ip:%s dst_port:%d\",38 nsecs,39 $saddr,40 $sport,41 $daddr,42 $dport);43 }44}45"""464748def demo_func():49 table_name = 'tcp_retransmits_table'50 pxtrace.UpsertTracepoint('tcp_retranmits_probe',51 table_name,52 program,53 pxtrace.kprobe(),54 "10m")55 # Rename columns56 df = px.DataFrame(table=table_name,57 select=['time_', 'src_ip', 'src_port', 'dst_ip', 'dst_port'])5859 # Convert IPs to domain names.60 df.src = px.pod_id_to_pod_name(px.ip_to_pod_id(df.src_ip))61 df.dst = px.Service(px.nslookup(df.dst_ip))6263 # Count retransmits.64 df = df.groupby(['src', 'dst']).agg(retransmits=('ns_src', px.count))6566 # Filter for a particular service, if desired.67 df = df[px.contains(df['dst'], '')]6869 return df
Pixie's scripts are written using the Pixie Language (PxL), a domain-specific language that is heavily influenced by the popular Python data processing library Pandas.
On line 8, we've included Dale Hamel's tcpretrans.bt bpftrace tool from the iovisor/bpftrace repo as a string. We've tweaked the original trace in order to work with Pixie's bpftrace rules (seen in the "Output" section above):
tcpretrans.bt
so that the program contains a single printf
statement.printf
statement on line 72 of tcpretrans.bt
to name the output columns (no whitespaces)printf
statement on line 72 of tcpretrans.bt
to output time using the reserved column name time_
and passing it the nsecs
argument.Some further modifications were made to simplify the program for the purposes of this tutorial (for example, removing the TCP state), but those are not required changes.
On line 50, we call UpsertTracepoint
with the following arguments:
Lines 55-69 query the collected data, convert known IPs to domain names, and group the retransmits by source and destination IPs tallying the number of retransmits.
If you'd like to filter the results to a particular service, modify line 67 to include the namespace:
df = df[px.contains(df['dst'], 'sock-shop')]
Run px/tracepoint_status
to see the information about all of the tracepoints running on your cluster. The STATUS
column can be used to debug why a tracepoint fails to deploy.
The following bpftrace programs are available today for use in Pixie:
capable.bt
: use the bpftrace/capable
scriptdcsnoop.bt
: use the bpftrace/dc_snoop
script.mdflush.bt
: use the bpftrace/md_flush
script.naptime.bt
: use the bpftrace/nap_time
script.oomkill.bt
: use the bpftrace/oom_kill
script.syncsnoop.bt
: use the bpftrace/sync_snoop
script.tcpdrop.bt
: use the bpftrace/tcp_drops
script.tcpretrans.bt
: use the bpftrace/tcp_retransmits
script.Many other bpftrace programs can work with Pixie. Some may require a few modifications to obey the rules listed above.
If you have any questions about this feature or how to incorporate your own bpftrace code, we'd be happy to help out over on our Slack.