기본 콘텐츠로 건너뛰기

netlink: memory mapped I/O

The following patches contain an implementation of memory mapped I/O for
netlink, rebased onto the current net-next tree. The implementation is
modelled after AF_PACKET memory mapped I/O with a few differences:

- In order to perform memory mapped I/O to userspace, the kernel allocates
  skbs with the data area pointing to the data area of the mapped frames.
  All netlink subsystems assume a linear data area, so for the sake of
  simplicity, the mapped data area is not attached to the paged area but
  to skb->data. This requires introduction of a special skb alloction
  function that just allocates an skb head without the data area. Since this
  is a quite rare use case, I introduced a new function based on __alloc_skb
  instead of splitting it up into head and data alloction. The alternative
  would be to   introduce an __alloc_skb_head and __alloc_skb_data function,
  which would actually be useful for a specific error case in memory mapped
  netlink, but would require a couple of extra instructions for the common
  skb allocation case, so it doesn't really seem worth it.

  In order to get the destination memory area for skb->data before message
  construction, memory mapped netlink I/O needs to look up the destination
  socket during allocation instead of during transmission because the
  ring is owned by the receiveing socket/process. A special skb allocation
  function (netlink_alloc_skb) taking the destination pid as an argument is
  used for this, all subsystems that want to support memory mapped I/O need
  to use this function, automatic fallback to the receive queue happens
  for unconverted subsystems. Dumps automatically use memory mapped I/O if
  the receiving socket has enabled it.

  The visible effect of looking up the destination socket during allocation
  instead of transmission is that message ordering in userspace might
  change in case allocation and transmission aren't performed atomically.
  This usually doesn't matter since most subsystems have a BKL-like lock
  like the rtnl mutex, to my knowledge the currently only existing case
  where it might matter is nfnetlink_queue combined with the recently
  introduced batched verdicts, but a) that subsystem already includes
  sequence numbers which allow userspace to reorder messages in case it
  cares to, also the reodering window is quite small and b) with memory
  mapped transmission batching can be performed in a subsystem indepandant
  manner.

- AF_NETLINK contains flow control for database dumps, with regular I/O
  dump continuation are triggered based on the sockets receive queue space
  and by recvmsg() calls. Since with memory mapped I/O there are no
  recvmsg() calls under normal operation, this is done in netlink_poll(),
  under the assumption that userspace has processed all pending frames
  before invoking poll(), thus the ring is expected to have room for new
  messages. Dumps currently don't benefit as much as they could from
  memory mapped I/O because each single continuation requires a poll()
  call. A more agressive approach seems like a good idea to me, especially
  in case the socket is not subscribed to any multicast groups (IOW only
  receiving explicitly requested data).

Besides that, the memory mapped netlink implementation extends the states
defined by AF_PACKET between userspace and the kernel by a SKIP status, this
is intended for the case that userspace wants to queue frames (specifically
when using nfnetlink_queue, an IDS and stream reassembly, requested by
Eric Leblond) for a longer period of time. The kernel skips over all frames
marked with SKIP when looking or unused frames and only fails when not finding
a free frame or when having skipped the entire ring.

Also noteworthy is memory mapped sendmsg: the kernel performs validation
of messages before accepting and processing them, in order to prevent
userspace from changing the messages contents after validation, the
kernel checks that the ring is only mapped once and the file descriptor
is not shared (in order to avoid having userspace set up another mapping
after the first mentioned check). If either of both is not true, the
message copied to an allocated skb and processed as with regular I/O.

As an example, nfnetlink_queue is convererted to support memory mapped
I/O. Other subsystems that would probably benefit are nfnetlink_log,
audit and maybe ISCSI.

Since the last posting only a minor bug in ring teardown has been fixed.
I'm still working with Florian on solving the nfnetlink_queue ordering
issue, besides that there are no known issues.

Some older performance numbers with nfnetlink_queue from Florian:

nfq recv: regular netlink I/O
mnl recv: mmap'ed netlink I/O
batch: number of batched verdicts

1400 byte UDP packets, 8 cores, NFQUEUE balancing using 4 queues, only
mmap'ed RX, regular TX, userspace running inline Snort with all rules
and preprocessors enabled:

nfq recv, batch 0  1250 MBit total rx
mnl recv, batch 0  1230 MBit total rx
nfq recv, batch 10 1590 MBit total rx
mnl recv, batch 10 1770 MBit total rx

I'll try to get some new numbers soon, including TX and dumps.

I'll try to get some new numbers soon, including TX and dumps.


The patches are available in a git tree at:

git://github.com/kaber/netlink-mmap master

once git push has finished.


Patrick McHardy (11):
      netlink: add symbolic value for congested state
      net: add function to allocate skbuff head without data area
      netlink: don't orphan skb in netlink_trim()
      netlink: add netlink_skb_set_owner_r()
      netlink: mmaped netlink: ring setup
      netlink: add mmap'ed netlink helper functions
      netlink: implement memory mapped sendmsg()
      netlink: implement memory mapped recvmsg()
      nfnetlink: add support for memory mapped netlink
      netlink: add flow control for memory mapped I/O
      netlink: add documentation for memory mapped I/O

 Documentation/networking/netlink_mmap.txt |  337 ++++++++++++
 include/linux/netfilter/nfnetlink.h       |    2 +
 include/linux/netlink.h                   |   42 ++
 include/linux/skbuff.h                    |    6 +
 net/Kconfig                               |    9 +
 net/core/skbuff.c                         |   31 +-
 net/netfilter/nfnetlink.c                 |    7 +
 net/netfilter/nfnetlink_log.c             |    9 +-
 net/netfilter/nfnetlink_queue_core.c      |    2 +-
 net/netlink/af_netlink.c                  |  849 +++++++++++++++++++++++++++--
 10 files changed, 1250 insertions(+), 44 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

댓글

이 블로그의 인기 게시물

톰캣 세션 타임 아웃 설정

web.xml 파일이 있습니다. 이 파일을 열어서 session이라고 검색해 보십시오. <session-config>   <session-timeout>360</session-timeout> </session-config> 위 단락을 찾을 수 있습니다. session-timeout 시간 360이 바로 자동로그아웃 세션 시간입니다.  단위는 분이고요. 30분으로 하고 싶으시면 30으로 바꿔서 저장해주시면 되는 것이죠~ Tomcat 내에서 Session Timeout 를 설정하는 우선 순위가 존재 한다. session.setMaxInactiveInterval() 프로그램내에서 time out 를 설정했을 경우 Web application 내의 WEB-INF/web.xml Tomcat 내의 conf/web.xml 실제로 Tomcat(conf/web.xml)내에 Default 로 설정되어 있는 것은 다음과 같다. < HttpSession 메서드 > getCreationTime() - 세션 생성 시간 getLastAccessedTime() - 마지막 요청 시간 setMaxInactiveInterval() - 최대허용시간 설정 (초) getMaxInactiveInterval() - 최대허용시간 invalidate() - 세션 제거 < 타임아웃 설정하기 > - 일정 시간 동안 요청이 없으면 세션을 제거한다  1. DD에서 전체 세션 타임아웃 설정       web.xml 1. DD에서 전체 세션 타임아웃 설정       web.xml <web-app ... >     <servlet>        ...

java 특정 디렉토리에 있는 파일 목록을 읽어내기, 정렬해서 가져오기

폴더 리스트 가져오기 String path="C:\"; File dirFile=new File(path); File []fileList=dirFile.listFiles(); for(File tempFile : fileList) {   if(tempFile.isFile()) {     String tempPath=tempFile.getParent();     String tempFileName=tempFile.getName();     System.out.println("Path="+tempPath);     System.out.println("FileName="+tempFileName);     /*** Do something withd tempPath and temp FileName ^^; ***/   } } 정렬해서 가져오기 import java.io.FileFilter; import java.io.IOException; import java.util.Arrays; import java.util.Date; import org.apache.commons.io.comparator.LastModifiedFileComparator; import org.apache.commons.io.filefilter.FileFileFilter; public class LastModifiedFileComparatorTest { public static void main(String[] args) throws IOException { File directory = new File("."); // get just files, not directories File[] files = directory.listFiles((FileFilter) FileFileFilter.FILE); System.out.println("Defaul...

dmesg 메시지 실시간으로 보기

참조사이트 http://imitator.kr/Linux/556 # tail -f /var/log/messages # tail -f |dmesg //기본 2초 단위로 갱신 된다. # watch "dmesg | tail -f" //1초 단위로 갱신하면서 보여준다. # watch -n 1 "dmesg | tail -f" // 보여주는 줄을 20으로 늘린다. (기본 10줄) # watch -n 1 "dmesg | tail -f -n 20"