Linux xargs command
Содержание:
Параллельный запуск процессов
Xargs часто используется для параллельного запуска нескольких процессов. Вот так, например, можно одновременно cжать несколько директорий в tar.gz:
$ echo dir1 dir2 dir3 | xargs -P 3 -I NAME tar czf NAME.tar.gz NAME
В приведенном примере используется ключ -P. Он указывает максимальное количество процессов, которые будут выполняться одновременно. Предположим, что у нас на входе имеется 10 аргументов. Если мы введём команду xargs с ключoм -P 3, то будет запущено 3 экземпляра команды, следующей после xargs, с каждым из этих аргументов.
С помощью xargs можно также параллельно загружать из Интернета множество файлов:
$ wget -nv <ссылка> | egrep -o "http://]*.jpg" | xargs -P 10 -n 1 wget -nv
В приведенном примере с указанного адреса будут скачаны все графические файлы с расширением jpg; ключ -P указывает, что требуется скачивать по 10 файлов одновременно.
Examples
One use case of the xargs command is to remove a list of files using the rm command. POSIX systems have an ARG_MAX for the maximum total length of the command line, so the command may fail with an error message of «Argument list too long» (meaning that the exec system call’s limit on the length of a command line was exceeded): or . (The latter invocation is incorrect, as it may expand globs in the output.)
This can be rewritten using the command to break the list of arguments into sublists small enough to be acceptable:
find /path -type f -print | xargs rm
In the above example, the utility feeds the input of with a long list of file names. then splits this list into sublists and calls once for every sublist.
xargs can also be used to parallelize operations with the argument to specify how many parallel processes should be used to execute the commands over the input argument lists. However, the output streams may not be synchronized. This can be overcome by using an argument where possible, and then combining the results after processing. The following example queues 24 processes and waits on each to finish before launching another.
find /path -name '*.foo' | xargs -P 24 -I '{}' /cpu/bound/process '{}' -o '{}'.out
xargs often covers the same functionality as the command substitution feature of many shells, denoted by the notation (`...` or $(...)). xargs is also a good companion for commands that output long lists of files such as , and , but only if one uses (or equivalently ), since without deals badly with file names containing ', " and space. GNU Parallel is a similar tool that offers better compatibility with find, locate and grep when file names may contain ', ", and space (newline still requires ).
Separator problem
Many Unix utilities are line-oriented. These may work with as long as the lines do not contain , , or a space. Some of the Unix utilities can use NUL as record separator (e.g. Perl (requires and instead of ), (requires using ), (requires using ), (requires or ), (requires using )). Using for deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. , , , , , , , ).
But often people forget this and assume is also line-oriented, which is not the case (per default separates on newlines and blanks within lines, substrings with blanks must be single- or double-quoted).
The separator problem is illustrated here:
# Make some targets to practice on touch important_file touch 'not important_file' mkdir -p '12" records' find . -name not\* | tail -1 | xargs rm find \! -name . -type d | tail -1 | xargs rmdir
Running the above will cause to be removed but will remove neither the directory called , nor the file called .
The proper fix is to use the GNU-specific option, but (and other tools) do not support NUL-terminated strings:
# use the same preparation commands as above find . -name not\* -print0 | xargs -0 rm find \! -name . -type d -print0 | xargs -0 rmdir
When using the option, entries are separated by a null character instead of an end-of-line. This is equivalent to the more verbose command: or shorter, by switching to (non-POSIX) line-oriented mode with the (delimiter) option:
but in general using with should be preferred, since newlines in filenames are still a problem.
GNU is an alternative to that is designed to have the same options, but is line-oriented. Thus, using GNU Parallel instead, the above would work as expected.
For Unix environments where does not support the nor the option (e.g. Solaris, AIX), the POSIX standard states that one can simply backslash-escape every character:. Alternatively, one can avoid using xargs at all, either by using GNU parallel or using the functionality of .
EXAMPLES top
1. The following command combines the output of the parenthesized commands (minus the <apostrophe> characters) onto one line, which is then appended to the file log. It assumes that the expansion of "$0$*" does not include any <apostrophe> or <newline> characters. (logname; date; printf "'%s'\n$0 $*") | xargs −E "" >>log 2. The following command invokes diff with successive pairs of arguments originally typed as command line arguments. It assumes there are no embedded <newline> characters in the elements of the original argument list. printf "%s\n$@" | sed 's/]/\\&/g' | xargs −E "" −n 2 −x diff 3. In the following commands, the user is asked which files in the current directory (excluding dotfiles) are to be archived. The files are archived into arch; a, one at a time or b, many at a time. The commands assume that no filenames contain <blank>, <newline>, <backslash>, <apostrophe>, or double-quote characters. a. ls | xargs −E "" −p −L 1 ar −r arch b. ls | xargs −E "" −p −L 1 | xargs −E "" ar −r arch 4. The following command invokes command1 one or more times with multiple arguments, stopping if an invocation of command1 has a non-zero exit status. xargs −E "" sh −c 'command1 "$@" || exit 255' sh < xargs_input 5. On XSI-conformant systems, the following command moves all files from directory $1 to directory $2, and echoes each move command just before doing it. It assumes no filenames contain <newline> characters and that neither $1 nor $2 contains the sequence "{}". ls −A "$1" | sed −e 's/"/"\\""/g' −e 's/.*/"&"/' | xargs −E "" −I {} −t mv "$1"/{} "$2"/{}
ENVIRONMENT VARIABLES top
The following environment variables shall affect the execution of xargs: LANG Provide a default value for the internationalization variables that are unset or null. (See the Base Definitions volume of POSIX.1‐2008, Section 8.2, Internationalization Variables for the precedence of internationalization variables used to determine the values of locale categories.) LC_ALL If set to a non-empty string value, override the values of all the other internationalization variables. LC_COLLATE Determine the locale for the behavior of ranges, equivalence classes, and multi-character collating elements used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_CTYPE Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files) and the behavior of character classes used in the extended regular expression defined for the yesexpr locale keyword in the LC_MESSAGES category. LC_MESSAGES Determine the locale used to process affirmative responses, and the locale used to affect the format and contents of diagnostic messages and prompts written to standard error. NLSPATH Determine the location of message catalogs for the processing of LC_MESSAGES. PATH Determine the location of utility, as described in the Base Definitions volume of POSIX.1‐2008, Chapter 8, Environment Variables.
Encoding problem
The argument separator processing of is not the only problem with using the program in its default mode. Most Unix tools which are often used to manipulate filenames (for example , , , etc.) are text processing tools. However, Unix path names are not really text. Consider a path name /aaa/bbb/ccc. The /aaa directory and its bbb subdirectory can in general be created by different users with different environments. That means these users could have a different locale setup, and that means that aaa and bbb do not even necessarily have to have the same character encoding. For example, aaa could be in UTF-8 and bbb in Shift JIS. As a result, an absolute path name in a Unix system may not be correctly processable as text under a single character encoding. Tools which rely on their input being text may fail on such strings.
One workaround for this problem is to run such tools in the C locale, which essentially processes the bytes of the input as-is. However, this will change the behavior of the tools in ways the user may not expect (for example, some of the user’s expectations about case-folding behavior may not be met).
Other Common Uses
3.1. Limit Output per Line
To find out what xargs does with the output from the find command, let’s add echo before the rm command:
Because we added the -n 1 argument, xargs turns each line into a command of its own.
3.2. Enable User Prompt Before Execution
To prompt the user with y (yes) and n (no) options before execution, let’s use the -p option:
Since we have provided the ‘y’ option, file5.log has now been removed:
3.3. Insert Arguments at a Particular Position
The xargs command offers options to insert the listed arguments at some arbitrary position other than the end of the command line.
The -I option takes a string that gets replaced with the supplied input before the command executes. Although this string can be any set of characters, a common choice for it is “%”.
Let’s move file6.log to the backup directory:
3.4. Enable Multiple Process Usage
To parallelize operations, we can use the -P option to specify the number of parallel processes used in executing the commands over the input argument list.
Let’s use it to parallel encode a series of wav files to mp3 format:
The encoding process executes using three processes, since -P 3 is specified.
OPTIONS top
The xargs utility shall conform to the Base Definitions volume of POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines. The following options shall be supported: −E eofstr Use eofstr as the logical end-of-file string. If −E is not specified, it is unspecified whether the logical end-of- file string is the <underscore> character ('_') or the end- of-file string capability is disabled. When eofstr is the null string, the logical end-of-file string capability shall be disabled and <underscore> characters shall be taken literally. −I replstr Insert mode: utility is executed for each logical line from standard input. Arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters. Any unquoted unescaped <blank> characters at the beginning of each line shall be ignored. The resulting argument shall be inserted in arguments in place of each occurrence of replstr. At least five arguments in arguments can each contain one or more instances of replstr. Each of these constructed arguments cannot grow larger than an implementation-defined limit greater than or equal to 255 bytes. Option −x shall be forced on. −L number The utility shall be executed for each non-empty number lines of arguments from standard input. The last invocation of utility shall be with fewer lines of arguments if fewer than number remain. A line is considered to end with the first <newline> unless the last character of the line is a <blank>; a trailing <blank> signals continuation to the next non-empty line, inclusive. −n number Invoke utility using as many standard input arguments as possible, up to number (a positive decimal integer) arguments maximum. Fewer arguments shall be used if: * The command line length accumulated exceeds the size specified by the −s option (or {LINE_MAX} if there is no −s option). * The last iteration has fewer than number, but not zero, operands remaining. −p Prompt mode: the user is asked whether to execute utility at each invocation. Trace mode (−t) is turned on to write the command instance to be executed, followed by a prompt to standard error. An affirmative response read from /dev/tty shall execute the command; otherwise, that particular invocation of utility shall be skipped. −s size Invoke utility using as many standard input arguments as possible yielding a command line length less than size (a positive decimal integer) bytes. Fewer arguments shall be used if: * The total number of arguments exceeds that specified by the −n option. * The total number of lines exceeds that specified by the −L option. * End-of-file is encountered on standard input before size bytes are accumulated. Values of size up to at least {LINE_MAX} bytes shall be supported, provided that the constraints specified in the DESCRIPTION are met. It shall not be considered an error if a value larger than that supported by the implementation or exceeding the constraints specified in the DESCRIPTION is given; xargs shall use the largest value it supports within the constraints. −t Enable trace mode. Each generated command line shall be written to standard error just prior to invocation. −x Terminate if a constructed command line will not fit in the implied or specified size (see the −s option above).
DESCRIPTION top
This manual page documents the GNU version of xargs. xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored. The command line for command is built up until it reaches a system- defined limit (unless the -n and -L options are used). The specified command will be invoked as many times as necessary to use up the list of input items. In general, there will be many fewer invocations of command than there were items in the input. This will normally have significant performance benefits. Some commands can usefully be executed in parallel too; see the -P option. Because Unix filenames can contain blanks and newlines, this default behaviour is often problematic; filenames containing blanks and/or newlines are incorrectly processed by xargs. In these situations it is better to use the -0 option, which prevents such problems. When using this option you will need to ensure that the program which produces the input for xargs also uses a null character as a separator. If that program is GNU find for example, the -print0 option does this for you. If any invocation of the command exits with a status of 255, xargs will stop immediately without reading any further input. An error message is issued on stderr when this happens.
A Typical Example
Sometimes, we may need to pass the output of one command as input for another. Some commands can take input either as command-line arguments or from the standard input.
However, there are others like cp, rm, and echo that can only take input as arguments. In such situations, we can use xargs to convert input coming from standard input to arguments.
Let’s now see an example that illustrates this.
Let’s say we need to remove all log files, older than seven days, from this log directory:
To find files older than seven days, we’ll use the find command with -mtime option:
Now, let’s use the pipe operator in order to send the output of find as the input for rm:
This prints an error message since rm expects arguments and can’t read them from STDIN.
Listing the log directory still shows the same number of files, as the rm command failed:
Instead, let’s use xargs along with rm:
Now rm removes files older than seven days.
Let’s now see other common scenarios for xargs.