Wednesday, November 21, 2012

File Format Integrations

File Format Integrations: "Importer 'bin/mahout' jobs
Run these with --help to see options

bin/mahout arff.vector
bin/mahout lucene.vector
bin/mahout seqdirectory
turns text files into sequence files, one file per key/value pair
bin/mahout SequenceFilesFromMailArchives
parses mailboxes and emits one text body per mail message
bin/mahout regexconverter
reads text lines and emits the regex output lines into SequenceFiles."

'via Blog this'