Scans multiple files looking for a REGEX pattern, and summarised what it finds as a CSV file. java.exe -jar C:\com\mindprod\pluck\pluck.jar "\.[a-z]+\." E:\temp\temp.csv E:\somedir or java.exe -jar C:\com\mindprod\pluck\pluck.jar "\.[a-z]+\." console E:\somedir\somefile.txt C:\temp -s G:\myApps adjusting as necessary to account for where the jar file is. The first parameter is the regex pattern. See http://mindprod.com/jgloss/regex.html for how to compose them. The next parameter is where the output in to go. use the word console to have the output appear on the console. Then put a list of files and directories on command line you want to scan where -s means recursively include all subdirectories for everything to the right of -s. It will look only for *.html, *.htm, *.xml, *.txt extensions. You can't change that via the command line, though you could modify the program. The command line does not currently support wildcards, e.g. ap*.txt or ff?.html. You need to specify the full names of files or directories, or . to mean all the files in the current directory. When you write your regex expression, you don't double your \. You are not creating a Java string literal. It can be tricky to get various characters in your regex passed through to the Pluck program. Awkward Characters on the Windows Command Line Char Special Meaning How to pass it through to the program ****** ********************** *************************************** space separates parameters enclose the whole parameter in quotes " < input redirection enclose the whole parameter in quotes " > output redirection enclose the whole parameter in quotes " | pipe enclose the whole parameter in quotes " ' none enclose the whole parameter in quotes " " parameter delimiter write it as \" and enclose the whole parameter in quotes " % macro replace write it as %% If you are using Linux bash, or other Bourne compatible or csh compatible shell, enclose your regex in single quotes, '...'. Then the only character you need to worry about inside the regex is ' itself which needs to be encoded as '\'' i.e. apos, backslash, apos, apos. This seems rather long winded. What you are doing is ending the string, doing the quote (quoting it with \ much as you would in Java string literals), then starting it up again, concatenating all three pieces. Pluck echos what the command processor gives it for the regex expression. Verify it did not mangle any of your characters. If you are having trouble solving your problem with Pluck and regexes, you might precondition the files by converting all newlines and control chars to space, running the files to be scanned through a tidy program, such as: http://tidy.sourceforge.net/ or running HTML to be scanned through the Compactor program. See: http://mindprod.com/products1.html#COMPACTOR You might consider sifting through malformed HTML with TagSoup. See: http://mindprod.com/jgloss/tagsoup.html You might even write a full-blown parser. See: http://mindprod.com/jgloss/parser.html