Back in September I wrote about a problem we’d come across when capturing traffic with pcap_dispatch()
or pcap_next_ex()
on Ubuntu Trusty or Debian Jessie. When the traffic was slow, we saw packets not being captured.
We’ve since done a bit more digging. The problem, we think, is a bug in the Linux kernel select()
system call. With both pcap_dispatch()
and
pcap_next_ex()
we’re using a central loop that is basically:
pcap_dispatch(); select(pcapfd, timeout);
The length of timeout in the select()
call shouldn’t matter. But it does. In our test scenario, set it to 1ms and every packet in a ping to an otherwise idle network connection will be captured. Set it to 2s and most or all will be missed.
Robert Edmonds has suggested that it’s this kernel bug. Thanks, Robert – that looks like the problem to us. This was fixed in kernel 3.19. We’ve filed a Debian bug and a Ubuntu bug.
So, what can you do about it for now?
- If using Ubuntu Trusty, consider switching to the LTS Enablement Stack. This has the fix applied.
- If using Debian Jessie, consider switching to a 4.9 series kernel from Jessie backports,
- Otherwise consider reducing the timeout in your call to
select()
. As noted above, this certainly improves the situation for our specific test scenario. However, we can’t be confident that it is a definitive fix; make sure you test your particular circumstances.