Tuesday, November 27, 2012

hadoop - Converting CSV to SequenceFile - Stack Overflow

hadoop - Converting CSV to SequenceFile - Stack Overflow: "seqdirectory command takes every file as a document, so in reality, you only have one document, hence you only get one vector. To make it work properly you would make each line of your CSV file a file itself, where the key of the document is the name of the file and the value are its content. Nonetheless, this is quite unpractical if your corpus is large as disk reading and writing can become painfully slow.

In practice you are better off following the links I share in this comment"

'via Blog this'