a suite of four utilities to scan a set of files looking for a set of strings: extract - displays all lines in all files with a match on one or more strings, case-sensitive. (precise) extracti - displays all lines in all files with a match on one or more strings, case-insensitive. (relaxed) without - displays all lines in all files with no match on any of the strings, case-sensitive. (precise) withouti - displays all lines in all files with no match on any of the strings, case-insensitive. (relaxed) It works on text and html files. To find matching lines in a single file, type: java.exe -jar C:\com\mindprod\extract\extract.jar "wombat" "zebra" - myfile.html Everything is case-sensitive. The results display on the console. You can redirect with in the usual way with > results.txt You can also list several files on the command line: java.exe -jar C:\com\mindprod\extract\extract.jar "wombat" "zebra" - myfile.html C:\mydir\another.html To use search the current directory of files: java.exe -jar C:\com\mindprod\extract\extract.jar "wombat" "zebra" - . Sorry no wildcards, just . , and .. DON'T USE WILDCARDS (*.xxx) unless you deeply understand how they work. See http://mindprod.com/jgloss/wildcard.html. Windows expands them, not the utility, and feeds them to the utility as a giant list of all the directories and files in the current directory. The utility will thus tend process all the files in your directories, when you just meant to process the files in the current directory. The -s switch makes all subsequent directories searched recursively to include all their subdirectories. e.g. java.exe -jar C:\com\mindprod\amper\extract\extract.jar "wombat" "zebra" - -s E:\mindprod will scan all files in the mindprod directory tree. INVOKING If you have the jar extension set up as executable, you can abbreviate: C:\com\mindprod\extract\extract.jar "wombat" "zebra" - E:\mindprod If you have Jet, you can compile the jars and abbreviate even further: extract.exe "wombat" "zebra" - E:\mindprod Instead of extract, you can use one of the other utilities that work the same way: extracti, without, withouti. EXTENSIONS Because extract is designed to work only with text file, it ignores all files except those with the following extensions: ans, asm, bat, batfrag, btm, btmfrag, c, cfrag, cmd, cpp, cppfrag, css, cssfrag, csv, csvfrag, ctl, doc, dtd, dtdfrag, e, h, hfrag, hpp, hppfrag, htm, html, htmlfrag, ih, ini, java, javafrag, jnlp, jnlpfrag, jsp, jspfrag, list, log, look, lst, mac, mft, pas, policy, prn, properties, ps, rh, sh, site, sql, sqlfrag, tab, txt, use, wiki, xml, xmlfrag, xsd, xsdfrag. If you need more extensions, please ask, or add them yourself in Extract.java. SWITCHES -all You may use the -all switch on the command line to extract only if all the strings match. -where You may use the -where switch on the command line if you want to have the output include the name of the file and line number where the line was found in CSV format. It appears on the console. If you want to capture it to a file use > redirection. -s -s means include files in all subdirectories of a directory mentioned. - Dash separates the strings and regexes from the file names. QUOTING Depending on your operating system, there are a number of characters that have magic meaning on the command line. They won't necessarily be passed through to the program. Ones to watch out for in Vista include " \ & ^ + | < > space. Try enclosing awkward characters in quotes e.g. "<& | >" To put a " inside quotes use \" e.g. "he said \"Hi\"". To put a \ inside quotes use \\ e.g. "the \\ is called backslash; the / is called slash". If these drive you to distraction, you might try using regexes, since the OS does not interfere with them in any way. REGEXES Regexes are an advanced feature primarily for programmers. The regexs to search for are put one per line in a file with any extension. Read up on how they work at http://mindprod.com/jgloss/regex.html java.exe -jar extract.jar "apple" "pear" @myregex.txt - somefile1.txt -s somedir extract.jar "apple" "pear" @myregex.txt - somefile1.txt -s somedir extract.exe "apple" "pear" @myregex.txt - somefile1.txt -s somedir Because command shells interfere so much with special characters on the command line, it would be highly confusing to put regex strings directly on the command line. Instead you put your regexes in tiny separate files. You but them on the command line to the left of the dash, with a leading @, not part of the filename itself. The tiny files must be incoded in UTF-8. They don't include Java string quoting. They don't use \uxxxx the way Java strings to you just key your accented characters directly. Use the Quoter amanuensis http://mindprod.com/applet/quoter.html to help you compose your regex strings with just regex quoting, not Java string quoting. For example if you wanted to search for either \ or / just before the word html, HTML hTMl... (case-insensitive) you would code (?i)[\\/]html in your little regex file not "(?i)[\\\\/]html" as you would in a Java program. It does not matter whether you use extract, extracti, without, withouti, the regexes are all case-sensitive unless you use (?i) embedded in the regex to turn on case insensitive mode. You can go back and forth in the same regex using (?-i) to flip back to case-sensitive mode. You can mix and match as many regular search strings and regexes as you can fit on the command line. You can speed the program up slightly by putting the most likely matches first. The results are the same no matter what order you put them. Be careful that your editor does not trim or add any trailing blanks on your regexes. You are safer to use \p{Blank} or \s (which has slightly different meaning). Any control characters in your regex files are stripped out prior to use. This means you can break your regexes into several lines in the file without penalty. LIMITATIONS 1. Extract just does searches, no replacements. 2. Extract does not show you to context of where it found each line, just the line itself. 3. Extract is a batch, command line program. There is no GUI and no interaction. 4. Extract offers no debugging tools to help you figure out why your regexes failing to find the lines you expect it to. 5. Regexes cannot span lines. 6. Java-style regexes only, no Unix, Perl, Funduc, SlickEdit etc. Why the haystack logo? These utilities help you find your needles in a haystack.