Linux xargs command
Содержание:
Параллельный запуск процессов
Xargs часто используется для параллельного запуска нескольких процессов. Вот так, например, можно одновременно cжать несколько директорий в tar.gz:
$ echo dir1 dir2 dir3 | xargs -P 3 -I NAME tar czf NAME.tar.gz NAME
В приведенном примере используется ключ -P. Он указывает максимальное количество процессов, которые будут выполняться одновременно. Предположим, что у нас на входе имеется 10 аргументов. Если мы введём команду xargs с ключoм -P 3, то будет запущено 3 экземпляра команды, следующей после xargs, с каждым из этих аргументов.
С помощью xargs можно также параллельно загружать из Интернета множество файлов:
$ wget -nv <ссылка> | egrep -o "http://]*.jpg" | xargs -P 10 -n 1 wget -nv
В приведенном примере с указанного адреса будут скачаны все графические файлы с расширением jpg; ключ -P указывает, что требуется скачивать по 10 файлов одновременно.
Examples
One use case of the xargs command is to remove a list of files using the rm command. POSIX systems have an ARG_MAX for the maximum total length of the command line, so the command may fail with an error message of «Argument list too long» (meaning that the exec system call’s limit on the length of a command line was exceeded): or . (The latter invocation is incorrect, as it may expand globs in the output.)
This can be rewritten using the command to break the list of arguments into sublists small enough to be acceptable:
find /path -type f -print | xargs rm
In the above example, the utility feeds the input of with a long list of file names. then splits this list into sublists and calls once for every sublist.
xargs can also be used to parallelize operations with the argument to specify how many parallel processes should be used to execute the commands over the input argument lists. However, the output streams may not be synchronized. This can be overcome by using an argument where possible, and then combining the results after processing. The following example queues 24 processes and waits on each to finish before launching another.
find /path -name '*.foo' | xargs -P 24 -I '{}' /cpu/bound/process '{}' -o '{}'.out
xargs often covers the same functionality as the command substitution feature of many shells, denoted by the notation (`...` or $(...)). xargs is also a good companion for commands that output long lists of files such as , and , but only if one uses (or equivalently ), since without deals badly with file names containing ', " and space. GNU Parallel is a similar tool that offers better compatibility with find, locate and grep when file names may contain ', ", and space (newline still requires ).
Separator problem
Many Unix utilities are line-oriented. These may work with as long as the lines do not contain , , or a space. Some of the Unix utilities can use NUL as record separator (e.g. Perl (requires and instead of ), (requires using ), (requires using ), (requires or ), (requires using )). Using for deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. , , , , , , , ).
But often people forget this and assume is also line-oriented, which is not the case (per default separates on newlines and blanks within lines, substrings with blanks must be single- or double-quoted).
The separator problem is illustrated here:
# Make some targets to practice on touch important_file touch 'not important_file' mkdir -p '12" records' find . -name not\* | tail -1 | xargs rm find \! -name . -type d | tail -1 | xargs rmdir
Running the above will cause to be removed but will remove neither the directory called , nor the file called .
The proper fix is to use the GNU-specific option, but (and other tools) do not support NUL-terminated strings:
# use the same preparation commands as above find . -name not\* -print0 | xargs -0 rm find \! -name . -type d -print0 | xargs -0 rmdir
When using the option, entries are separated by a null character instead of an end-of-line. This is equivalent to the more verbose command: or shorter, by switching to (non-POSIX) line-oriented mode with the (delimiter) option:
but in general using with should be preferred, since newlines in filenames are still a problem.
GNU is an alternative to that is designed to have the same options, but is line-oriented. Thus, using GNU Parallel instead, the above would work as expected.
For Unix environments where does not support the nor the option (e.g. Solaris, AIX), the POSIX standard states that one can simply backslash-escape every character:. Alternatively, one can avoid using xargs at all, either by using GNU parallel or using the functionality of .
EXAMPLES top
1. The following command combines the output of the parenthesized
commands (minus the <apostrophe> characters) onto one line, which
is then appended to the file log. It assumes that the expansion
of "$0$*" does not include any <apostrophe> or <newline>
characters.
(logname; date; printf "'%s'\n$0 $*") | xargs −E "" >>log
2. The following command invokes diff with successive pairs of
arguments originally typed as command line arguments. It assumes
there are no embedded <newline> characters in the elements of the
original argument list.
printf "%s\n$@" | sed 's/]/\\&/g' |
xargs −E "" −n 2 −x diff
3. In the following commands, the user is asked which files in the
current directory (excluding dotfiles) are to be archived. The
files are archived into arch; a, one at a time or b, many at a
time. The commands assume that no filenames contain <blank>,
<newline>, <backslash>, <apostrophe>, or double-quote characters.
a. ls | xargs −E "" −p −L 1 ar −r arch
b. ls | xargs −E "" −p −L 1 | xargs −E "" ar −r arch
4. The following command invokes command1 one or more times with
multiple arguments, stopping if an invocation of command1 has a
non-zero exit status.
xargs −E "" sh −c 'command1 "$@" || exit 255' sh < xargs_input
5. On XSI-conformant systems, the following command moves all files
from directory $1 to directory $2, and echoes each move command
just before doing it. It assumes no filenames contain <newline>
characters and that neither $1 nor $2 contains the sequence "{}".
ls −A "$1" | sed −e 's/"/"\\""/g' −e 's/.*/"&"/' |
xargs −E "" −I {} −t mv "$1"/{} "$2"/{}
ENVIRONMENT VARIABLES top
The following environment variables shall affect the execution of
xargs:
LANG Provide a default value for the internationalization
variables that are unset or null. (See the Base Definitions
volume of POSIX.1‐2008, Section 8.2, Internationalization
Variables for the precedence of internationalization
variables used to determine the values of locale
categories.)
LC_ALL If set to a non-empty string value, override the values of
all the other internationalization variables.
LC_COLLATE
Determine the locale for the behavior of ranges,
equivalence classes, and multi-character collating elements
used in the extended regular expression defined for the
yesexpr locale keyword in the LC_MESSAGES category.
LC_CTYPE Determine the locale for the interpretation of sequences of
bytes of text data as characters (for example, single-byte
as opposed to multi-byte characters in arguments and input
files) and the behavior of character classes used in the
extended regular expression defined for the yesexpr locale
keyword in the LC_MESSAGES category.
LC_MESSAGES
Determine the locale used to process affirmative responses,
and the locale used to affect the format and contents of
diagnostic messages and prompts written to standard error.
NLSPATH Determine the location of message catalogs for the
processing of LC_MESSAGES.
PATH Determine the location of utility, as described in the Base
Definitions volume of POSIX.1‐2008, Chapter 8, Environment
Variables.
Encoding problem
The argument separator processing of is not the only problem with using the program in its default mode. Most Unix tools which are often used to manipulate filenames (for example , , , etc.) are text processing tools. However, Unix path names are not really text. Consider a path name /aaa/bbb/ccc. The /aaa directory and its bbb subdirectory can in general be created by different users with different environments. That means these users could have a different locale setup, and that means that aaa and bbb do not even necessarily have to have the same character encoding. For example, aaa could be in UTF-8 and bbb in Shift JIS. As a result, an absolute path name in a Unix system may not be correctly processable as text under a single character encoding. Tools which rely on their input being text may fail on such strings.
One workaround for this problem is to run such tools in the C locale, which essentially processes the bytes of the input as-is. However, this will change the behavior of the tools in ways the user may not expect (for example, some of the user’s expectations about case-folding behavior may not be met).
Other Common Uses
3.1. Limit Output per Line
To find out what xargs does with the output from the find command, let’s add echo before the rm command:
Because we added the -n 1 argument, xargs turns each line into a command of its own.
3.2. Enable User Prompt Before Execution
To prompt the user with y (yes) and n (no) options before execution, let’s use the -p option:
Since we have provided the ‘y’ option, file5.log has now been removed:
3.3. Insert Arguments at a Particular Position
The xargs command offers options to insert the listed arguments at some arbitrary position other than the end of the command line.
The -I option takes a string that gets replaced with the supplied input before the command executes. Although this string can be any set of characters, a common choice for it is “%”.
Let’s move file6.log to the backup directory:
3.4. Enable Multiple Process Usage
To parallelize operations, we can use the -P option to specify the number of parallel processes used in executing the commands over the input argument list.
Let’s use it to parallel encode a series of wav files to mp3 format:
The encoding process executes using three processes, since -P 3 is specified.
OPTIONS top
The xargs utility shall conform to the Base Definitions volume of
POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.
The following options shall be supported:
−E eofstr Use eofstr as the logical end-of-file string. If −E is not
specified, it is unspecified whether the logical end-of-
file string is the <underscore> character ('_') or the end-
of-file string capability is disabled. When eofstr is the
null string, the logical end-of-file string capability
shall be disabled and <underscore> characters shall be
taken literally.
−I replstr
Insert mode: utility is executed for each logical line from
standard input. Arguments in the standard input shall be
separated only by unescaped <newline> characters, not by
<blank> characters. Any unquoted unescaped <blank>
characters at the beginning of each line shall be ignored.
The resulting argument shall be inserted in arguments in
place of each occurrence of replstr. At least five
arguments in arguments can each contain one or more
instances of replstr. Each of these constructed arguments
cannot grow larger than an implementation-defined limit
greater than or equal to 255 bytes. Option −x shall be
forced on.
−L number The utility shall be executed for each non-empty number
lines of arguments from standard input. The last invocation
of utility shall be with fewer lines of arguments if fewer
than number remain. A line is considered to end with the
first <newline> unless the last character of the line is a
<blank>; a trailing <blank> signals continuation to the
next non-empty line, inclusive.
−n number Invoke utility using as many standard input arguments as
possible, up to number (a positive decimal integer)
arguments maximum. Fewer arguments shall be used if:
* The command line length accumulated exceeds the size
specified by the −s option (or {LINE_MAX} if there is
no −s option).
* The last iteration has fewer than number, but not zero,
operands remaining.
−p Prompt mode: the user is asked whether to execute utility
at each invocation. Trace mode (−t) is turned on to write
the command instance to be executed, followed by a prompt
to standard error. An affirmative response read from
/dev/tty shall execute the command; otherwise, that
particular invocation of utility shall be skipped.
−s size Invoke utility using as many standard input arguments as
possible yielding a command line length less than size (a
positive decimal integer) bytes. Fewer arguments shall be
used if:
* The total number of arguments exceeds that specified by
the −n option.
* The total number of lines exceeds that specified by the
−L option.
* End-of-file is encountered on standard input before
size bytes are accumulated.
Values of size up to at least {LINE_MAX} bytes shall be
supported, provided that the constraints specified in the
DESCRIPTION are met. It shall not be considered an error if
a value larger than that supported by the implementation or
exceeding the constraints specified in the DESCRIPTION is
given; xargs shall use the largest value it supports within
the constraints.
−t Enable trace mode. Each generated command line shall be
written to standard error just prior to invocation.
−x Terminate if a constructed command line will not fit in the
implied or specified size (see the −s option above).
DESCRIPTION top
This manual page documents the GNU version of xargs. xargs reads
items from the standard input, delimited by blanks (which can be
protected with double or single quotes or a backslash) or newlines,
and executes the command (default is /bin/echo) one or more times
with any initial-arguments followed by items read from standard
input. Blank lines on the standard input are ignored.
The command line for command is built up until it reaches a system-
defined limit (unless the -n and -L options are used). The specified
command will be invoked as many times as necessary to use up the list
of input items. In general, there will be many fewer invocations of
command than there were items in the input. This will normally have
significant performance benefits. Some commands can usefully be
executed in parallel too; see the -P option.
Because Unix filenames can contain blanks and newlines, this default
behaviour is often problematic; filenames containing blanks and/or
newlines are incorrectly processed by xargs. In these situations it
is better to use the -0 option, which prevents such problems. When
using this option you will need to ensure that the program which
produces the input for xargs also uses a null character as a
separator. If that program is GNU find for example, the -print0
option does this for you.
If any invocation of the command exits with a status of 255, xargs
will stop immediately without reading any further input. An error
message is issued on stderr when this happens.
A Typical Example
Sometimes, we may need to pass the output of one command as input for another. Some commands can take input either as command-line arguments or from the standard input.
However, there are others like cp, rm, and echo that can only take input as arguments. In such situations, we can use xargs to convert input coming from standard input to arguments.
Let’s now see an example that illustrates this.
Let’s say we need to remove all log files, older than seven days, from this log directory:
To find files older than seven days, we’ll use the find command with -mtime option:
Now, let’s use the pipe operator in order to send the output of find as the input for rm:
This prints an error message since rm expects arguments and can’t read them from STDIN.
Listing the log directory still shows the same number of files, as the rm command failed:
Instead, let’s use xargs along with rm:
Now rm removes files older than seven days.
Let’s now see other common scenarios for xargs.







