Linux xargs command

Параллельный запуск процессов

Xargs часто используется для параллельного запуска нескольких процессов. Вот так, например, можно одновременно cжать несколько директорий в tar.gz:

$ echo dir1 dir2 dir3 | xargs -P 3 -I NAME tar czf NAME.tar.gz NAME

В приведенном примере используется ключ -P. Он указывает максимальное количество процессов, которые будут выполняться одновременно. Предположим, что у нас на входе имеется 10 аргументов. Если мы введём команду xargs с ключoм -P 3, то будет запущено 3 экземпляра команды, следующей после xargs, с каждым из этих аргументов.

С помощью xargs можно также параллельно загружать из Интернета множество файлов:

$ wget -nv <ссылка> | egrep -o "http://]*.jpg" | xargs -P 10 -n 1 wget -nv

В приведенном примере с указанного адреса будут скачаны все графические файлы с расширением jpg; ключ -P указывает, что требуется скачивать по 10 файлов одновременно.

Examples

One use case of the xargs command is to remove a list of files using the rm command. POSIX systems have an ARG_MAX for the maximum total length of the command line, so the command may fail with an error message of «Argument list too long» (meaning that the exec system call’s limit on the length of a command line was exceeded): or . (The latter invocation is incorrect, as it may expand globs in the output.)

This can be rewritten using the command to break the list of arguments into sublists small enough to be acceptable:

find /path -type f -print | xargs rm

In the above example, the  utility feeds the input of with a long list of file names. then splits this list into sublists and calls once for every sublist.

xargs can also be used to parallelize operations with the argument to specify how many parallel processes should be used to execute the commands over the input argument lists. However, the output streams may not be synchronized. This can be overcome by using an argument where possible, and then combining the results after processing. The following example queues 24 processes and waits on each to finish before launching another.

find /path -name '*.foo' | xargs -P 24 -I '{}' /cpu/bound/process '{}' -o '{}'.out

xargs often covers the same functionality as the command substitution feature of many shells, denoted by the notation (`...` or $(...)). xargs is also a good companion for commands that output long lists of files such as , and , but only if one uses (or equivalently ), since without deals badly with file names containing ', " and space. GNU Parallel is a similar tool that offers better compatibility with find, locate and grep when file names may contain ', ", and space (newline still requires ).

Separator problem

Many Unix utilities are line-oriented. These may work with as long as the lines do not contain , , or a space. Some of the Unix utilities can use NUL as record separator (e.g. Perl (requires and instead of ), (requires using ), (requires using ), (requires or ), (requires using )). Using for deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. , , , , , , , ).

But often people forget this and assume is also line-oriented, which is not the case (per default separates on newlines and blanks within lines, substrings with blanks must be single- or double-quoted).

The separator problem is illustrated here:

# Make some targets to practice on
touch important_file
touch 'not important_file'
mkdir -p '12" records'

find . -name not\* | tail -1 | xargs rm
find \! -name . -type d | tail -1 | xargs rmdir

Running the above will cause to be removed but will remove neither the directory called , nor the file called .

The proper fix is to use the GNU-specific option, but (and other tools) do not support NUL-terminated strings:

# use the same preparation commands as above
find . -name not\* -print0 | xargs -0 rm
find \! -name . -type d -print0 | xargs -0 rmdir

When using the option, entries are separated by a null character instead of an end-of-line. This is equivalent to the more verbose command: or shorter, by switching to (non-POSIX) line-oriented mode with the (delimiter) option:

but in general using with should be preferred, since newlines in filenames are still a problem.

GNU is an alternative to that is designed to have the same options, but is line-oriented. Thus, using GNU Parallel instead, the above would work as expected.

For Unix environments where does not support the nor the option (e.g. Solaris, AIX), the POSIX standard states that one can simply backslash-escape every character:. Alternatively, one can avoid using xargs at all, either by using GNU parallel or using the functionality of .

EXAMPLES top

        1. The following command combines the output of the parenthesized
           commands (minus the <apostrophe> characters) onto one line, which
           is then appended to the file log. It assumes that the expansion
           of "$0$*" does not include any <apostrophe> or <newline>
           characters.

               (logname; date; printf "'%s'\n$0 $*") | xargs −E "" >>log

        2. The following command invokes diff with successive pairs of
           arguments originally typed as command line arguments. It assumes
           there are no embedded <newline> characters in the elements of the
           original argument list.

               printf "%s\n$@" | sed 's/]/\\&/g' |
                   xargs −E "" −n 2 −x diff

        3. In the following commands, the user is asked which files in the
           current directory (excluding dotfiles) are to be archived. The
           files are archived into arch; a, one at a time or b, many at a
           time. The commands assume that no filenames contain <blank>,
           <newline>, <backslash>, <apostrophe>, or double-quote characters.

               a. ls | xargs −E "" −p −L 1 ar −r arch

               b. ls | xargs −E "" −p −L 1 | xargs −E "" ar −r arch

        4. The following command invokes command1 one or more times with
           multiple arguments, stopping if an invocation of command1 has a
           non-zero exit status.

               xargs −E "" sh −c 'command1 "$@" || exit 255' sh < xargs_input

        5. On XSI-conformant systems, the following command moves all files
           from directory $1 to directory $2, and echoes each move command
           just before doing it. It assumes no filenames contain <newline>
           characters and that neither $1 nor $2 contains the sequence "{}".

               ls −A "$1" | sed −e 's/"/"\\""/g' −e 's/.*/"&"/' |
                   xargs −E "" −I {} −t mv "$1"/{} "$2"/{}

ENVIRONMENT VARIABLES top

       The following environment variables shall affect the execution of
       xargs:

       LANG      Provide a default value for the internationalization
                 variables that are unset or null. (See the Base Definitions
                 volume of POSIX.1‐2008, Section 8.2, Internationalization
                 Variables for the precedence of internationalization
                 variables used to determine the values of locale
                 categories.)

       LC_ALL    If set to a non-empty string value, override the values of
                 all the other internationalization variables.

       LC_COLLATE
                 Determine the locale for the behavior of ranges,
                 equivalence classes, and multi-character collating elements
                 used in the extended regular expression defined for the
                 yesexpr locale keyword in the LC_MESSAGES category.

       LC_CTYPE  Determine the locale for the interpretation of sequences of
                 bytes of text data as characters (for example, single-byte
                 as opposed to multi-byte characters in arguments and input
                 files) and the behavior of character classes used in the
                 extended regular expression defined for the yesexpr locale
                 keyword in the LC_MESSAGES category.

       LC_MESSAGES
                 Determine the locale used to process affirmative responses,
                 and the locale used to affect the format and contents of
                 diagnostic messages and prompts written to standard error.

       NLSPATH   Determine the location of message catalogs for the
                 processing of LC_MESSAGES.

       PATH      Determine the location of utility, as described in the Base
                 Definitions volume of POSIX.1‐2008, Chapter 8, Environment
                 Variables.

Encoding problem

The argument separator processing of is not the only problem with using the program in its default mode. Most Unix tools which are often used to manipulate filenames (for example , , , etc.) are text processing tools. However, Unix path names are not really text. Consider a path name /aaa/bbb/ccc. The /aaa directory and its bbb subdirectory can in general be created by different users with different environments. That means these users could have a different locale setup, and that means that aaa and bbb do not even necessarily have to have the same character encoding. For example, aaa could be in UTF-8 and bbb in Shift JIS. As a result, an absolute path name in a Unix system may not be correctly processable as text under a single character encoding. Tools which rely on their input being text may fail on such strings.

One workaround for this problem is to run such tools in the C locale, which essentially processes the bytes of the input as-is. However, this will change the behavior of the tools in ways the user may not expect (for example, some of the user’s expectations about case-folding behavior may not be met).

Other Common Uses

3.1. Limit Output per Line

To find out what xargs does with the output from the find command, let’s add echo before the rm command:

Because we added the -n 1 argument, xargs turns each line into a command of its own.

3.2. Enable User Prompt Before Execution

To prompt the user with y (yes) and n (no) options before execution, let’s use the -p option:

Since we have provided the ‘y’ option, file5.log has now been removed:

3.3. Insert Arguments at a Particular Position

The xargs command offers options to insert the listed arguments at some arbitrary position other than the end of the command line.

The -I option takes a string that gets replaced with the supplied input before the command executes. Although this string can be any set of characters, a common choice for it is “%”.

Let’s move file6.log to the backup directory:

3.4. Enable Multiple Process Usage

To parallelize operations, we can use the -P option to specify the number of parallel processes used in executing the commands over the input argument list.

Let’s use it to parallel encode a series of wav files to mp3 format:

The encoding process executes using three processes, since -P 3 is specified.

OPTIONS top

       The xargs utility shall conform to the Base Definitions volume of
       POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.

       The following options shall be supported:

       −E eofstr Use eofstr as the logical end-of-file string. If −E is not
                 specified, it is unspecified whether the logical end-of-
                 file string is the <underscore> character ('_') or the end-
                 of-file string capability is disabled. When eofstr is the
                 null string, the logical end-of-file string capability
                 shall be disabled and <underscore> characters shall be
                 taken literally.

       −I replstr
                 Insert mode: utility is executed for each logical line from
                 standard input. Arguments in the standard input shall be
                 separated only by unescaped <newline> characters, not by
                 <blank> characters. Any unquoted unescaped <blank>
                 characters at the beginning of each line shall be ignored.
                 The resulting argument shall be inserted in arguments in
                 place of each occurrence of replstr.  At least five
                 arguments in arguments can each contain one or more
                 instances of replstr.  Each of these constructed arguments
                 cannot grow larger than an implementation-defined limit
                 greater than or equal to 255 bytes. Option −x shall be
                 forced on.

       −L number The utility shall be executed for each non-empty number
                 lines of arguments from standard input. The last invocation
                 of utility shall be with fewer lines of arguments if fewer
                 than number remain. A line is considered to end with the
                 first <newline> unless the last character of the line is a
                 <blank>; a trailing <blank> signals continuation to the
                 next non-empty line, inclusive.

       −n number Invoke utility using as many standard input arguments as
                 possible, up to number (a positive decimal integer)
                 arguments maximum. Fewer arguments shall be used if:

                  *  The command line length accumulated exceeds the size
                     specified by the −s option (or {LINE_MAX} if there is
                     no −s option).

                  *  The last iteration has fewer than number, but not zero,
                     operands remaining.

       −p        Prompt mode: the user is asked whether to execute utility
                 at each invocation. Trace mode (−t) is turned on to write
                 the command instance to be executed, followed by a prompt
                 to standard error. An affirmative response read from
                 /dev/tty shall execute the command; otherwise, that
                 particular invocation of utility shall be skipped.

       −s size   Invoke utility using as many standard input arguments as
                 possible yielding a command line length less than size (a
                 positive decimal integer) bytes. Fewer arguments shall be
                 used if:

                  *  The total number of arguments exceeds that specified by
                     the −n option.

                  *  The total number of lines exceeds that specified by the
                     −L option.

                  *  End-of-file is encountered on standard input before
                     size bytes are accumulated.

                 Values of size up to at least {LINE_MAX} bytes shall be
                 supported, provided that the constraints specified in the
                 DESCRIPTION are met. It shall not be considered an error if
                 a value larger than that supported by the implementation or
                 exceeding the constraints specified in the DESCRIPTION is
                 given; xargs shall use the largest value it supports within
                 the constraints.

       −t        Enable trace mode. Each generated command line shall be
                 written to standard error just prior to invocation.

       −x        Terminate if a constructed command line will not fit in the
                 implied or specified size (see the −s option above).

DESCRIPTION top

       This manual page documents the GNU version of xargs.  xargs reads
       items from the standard input, delimited by blanks (which can be
       protected with double or single quotes or a backslash) or newlines,
       and executes the command (default is /bin/echo) one or more times
       with any initial-arguments followed by items read from standard
       input.  Blank lines on the standard input are ignored.

       The command line for command is built up until it reaches a system-
       defined limit (unless the -n and -L options are used).  The specified
       command will be invoked as many times as necessary to use up the list
       of input items.  In general, there will be many fewer invocations of
       command than there were items in the input.  This will normally have
       significant performance benefits.  Some commands can usefully be
       executed in parallel too; see the -P option.

       Because Unix filenames can contain blanks and newlines, this default
       behaviour is often problematic; filenames containing blanks and/or
       newlines are incorrectly processed by xargs.  In these situations it
       is better to use the -0 option, which prevents such problems.   When
       using this option you will need to ensure that the program which
       produces the input for xargs also uses a null character as a
       separator.  If that program is GNU find for example, the -print0
       option does this for you.

       If any invocation of the command exits with a status of 255, xargs
       will stop immediately without reading any further input.  An error
       message is issued on stderr when this happens.

A Typical Example

Sometimes, we may need to pass the output of one command as input for another. Some commands can take input either as command-line arguments or from the standard input.

However, there are others like cp, rm, and echo that can only take input as arguments. In such situations, we can use xargs to convert input coming from standard input to arguments.

Let’s now see an example that illustrates this.

Let’s say we need to remove all log files, older than seven days, from this log directory:

To find files older than seven days, we’ll use the find command with -mtime option:

Now, let’s use the pipe operator in order to send the output of find as the input for rm:

This prints an error message since rm expects arguments and can’t read them from STDIN.

Listing the log directory still shows the same number of files, as the rm command failed:

Instead, let’s use xargs along with rm:

Now rm removes files older than seven days.

Let’s now see other common scenarios for xargs.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *