Jumat, 24 Januari 2014

Parse CSV upload






How easy can it be done?


CSV files are commaseparated files and can be opened and saved using Microsoft Excel. However, in most locales (at least US International and Europe) Excel uses a semicolon instead of a comma to separate fields. With the correct field separator the CSV files will be opened automatically in the right rows and columns in Excel. And saving an Excel table as CSV without using any wizard will deliver CSV files with the locale default field separation.



According to the RFC 4180 spec there are 3 important rules to keep in mind (you may also read "comma" as "semicolon"):



  1. Fields are separated by a comma.

  2. If a comma occurs within a field, then the field have to be surrounded by double quotes.

  3. If a double quote occurs within a field, then the field have to be surrounded by double quotes and the double quote within the field have to be escaped by another double quote.


Valid CSV records are:




field1,field2,field3

"field1,",field2,"fie""ld3"

"""field1""",",field2,",","","","""



Which should be parsed as:




field1    field2    field3

field1,   field2    fie"ld3

"field1"  ,field2,  ,",","



Back to top


Uploading a CSV file using IBM faces


Here is a basic example of the JSF code:



 id="scriptCollector1">
...
id="csvupload">

id="csvfile"
binding="#{myBean.csvFile}"
accept="application/vnd.ms-excel"
/>
for="csvfile" />

value="upload"
action="#{myBean.uploadCsvFile}"
/>

...


The h:message is a placeholder for any errormessage from the hx:fileupload if the application type of the uploaded file don't match the MIME type application/vnd.ms-excel. If you don't need the accept attribute of the hx:fileupload, then just leave this attribute and the h:message away.


The relevant java code for the backing bean MyBean.java:



package mypackage;

import java.io.ByteArrayInputStream;
import java.util.List;

import com.ibm.faces.component.html.HtmlFileupload;
import com.ibm.faces.fileupload.util.ContentElement;

import net.balusc.util.CsvUtil;

public class MyBean {

// Init --------------------------------------------------------------------------------------

private HtmlFileUpload csvFile;

// Actions -----------------------------------------------------------------------------------

public void uploadCsvFile() {

// Get uploaded csv file.
ContentElement csvContent = (ContentElement) csvFile.getValue();

// Process uploaded csv file.
if (csvContent != null) {
ByteArrayInputStream csvInput = new ByteArrayInputStream(csvContent.getContentValue());
List> csvList = CsvUtil.parseCsv(csvInput, ';');
// Now you can do your thing with the CSV List.
} else {
// Empty file error, do your thing.
}
}

// Getters -----------------------------------------------------------------------------------

public HtmlFileupload getCsvFile() {
return csvFile;
}

// Setters -----------------------------------------------------------------------------------

public void setCsvFile(HtmlFileupload csvFile) {
this.csvFile = csvFile;
}

}


The useful CsvUtil class is described in the next chapter:


Back to top


Parsing and formatting the CSV file


This kind of a CSV parser accepts an InputStream which can be feeded by a file upload, reading a local file or any other kind of a binary stream. And it returns a two-dimensional List. The first List contains the CSV records in the appropriate order and the second List contains the fields of each CSV record in the appropriate order.


Formatting a CSV is also relatively easy: just turn the parser around. The CSV formatter should accept a two-dimensional List and return an InputStream with which you can do anything. Saving to a file or streaming it to the response as a file download.




/*
* net/balusc/util/CsvUtil.java
*
* Copyright (C) 2006 BalusC
*
* This program is free software: you can redistribute it and/or modify it under the terms of the
* GNU Lesser General Public License as published by the Free Software Foundation, either version 3
* of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
* even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License along with this library.
* If not, see .
*/


package net.balusc.util;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
* Useful CSV utilities.
*
* @author BalusC
* @link http://balusc.blogspot.com/2006/06/parse-csv-upload.html
*/

public class CsvUtil {

// Init ---------------------------------------------------------------------------------------

// Defaults.
private static final char DEFAULT_CSV_SEPARATOR = ',';
private static final String DEFAULT_LINE_SEPARATOR = "\r\n"; // CRLF.

private CsvUtil() {
// Utility class, hide the constructor.
}

// Parsers ------------------------------------------------------------------------------------

/**
* CSV content parser. Convert an InputStream with the CSV contents to a two-dimensional List
* of Strings representing the rows and columns of the CSV. Each CSV record is expected to be
* separated by the default CSV field separator, a comma.
* @param csvInput The InputStream with the CSV contents.
* @return A two-dimensional List of Strings representing the rows and columns of the CSV.
*/

public static List> parseCsv(InputStream csvInput) {
return parseCsv(csvInput, DEFAULT_CSV_SEPARATOR);
}

/**
* CSV content parser. Convert an InputStream with the CSV contents to a two-dimensional List
* of Strings representing the rows and columns of the CSV. Each CSV record is expected to be
* separated by the specified CSV field separator.
* @param csvInput The InputStream with the CSV contents.
* @param csvSeparator The CSV field separator to be used.
* @return A two-dimensional List of Strings representing the rows and columns of the CSV.
*/

public static List> parseCsv(InputStream csvInput, char csvSeparator) {

// Prepare.
BufferedReader csvReader = null;
List> csvList = new ArrayList>();
String csvRecord = null;

// Process records.
try {
csvReader = new BufferedReader(new InputStreamReader(csvInput, "UTF-8"));
while ((csvRecord = csvReader.readLine()) != null) {
csvList.add(parseCsvRecord(csvRecord, csvSeparator));
}
} catch (IOException e) {
throw new RuntimeException("Reading CSV failed.", e);
} finally {
if (csvReader != null) try {
csvReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}

return csvList;
}

/**
* CSV record parser. Convert a CSV record to a List of Strings representing the fields of the
* CSV record. The CSV record is expected to be separated by the specified CSV field separator.
* @param record The CSV record.
* @param csvSeparator The CSV field separator to be used.
* @return A List of Strings representing the fields of each CSV record.
*/

private static List parseCsvRecord(String record, char csvSeparator) {

// Prepare.
boolean quoted = false;
StringBuilder fieldBuilder = new StringBuilder();
List fields = new ArrayList();

// Process fields.
for (int i = 0; i < record.length(); i++) {
char c = record.charAt(i);
fieldBuilder.append(c);

if (c == '"') {
quoted = !quoted; // Detect nested quotes.
}

if ((!quoted && c == csvSeparator) // The separator ..
|| i + 1 == record.length()) // .. or, the end of record.
{
String field = fieldBuilder.toString() // Obtain the field, ..
.replaceAll(csvSeparator + "$", "") // .. trim ending separator, ..
.replaceAll("^\"|\"$", "") // .. trim surrounding quotes, ..
.replace("\"\"", "\""); // .. and un-escape quotes.
fields.add(field.trim()); // Add field to List.
fieldBuilder = new StringBuilder(); // Reset.
}
}

return fields;
}

// Formatters --------------------------------------------------------------------------------

/**
* CSV content formatter. Convert a two-dimensional List of Objects to a CSV in an InputStream.
* The value of each Object will be obtained by its toString() method. The fields of each CSV
* record will be separated by the default CSV field separator, a comma.
* @param csvList A two-dimensional List of Objects representing the rows and columns of the
* CSV.
* @return The InputStream containing the CSV contents (actually a ByteArrayInputStream).
*/

public static extends Object> InputStream formatCsv(List> csvList) {
return formatCsv(csvList, DEFAULT_CSV_SEPARATOR);
}

/**
* CSV content formatter. Convert a two-dimensional List of Objects to a CSV in an InputStream.
* The value of each Object will be obtained by its toString() method. The fields of each CSV
* record will be separated by the specified CSV field separator.
* @param csvList A two-dimensional List of Objects representing the rows and columns of the
* CSV.
* @param csvSeparator The CSV field separator to be used.
* @return The InputStream containing the CSV contents (actually a ByteArrayInputStream).
*/

public static extends Object> InputStream formatCsv(List> csvList, char csvSeparator) {

// Prepare.
StringBuilder csvContent = new StringBuilder();

// Process records.
for (List csvRecord : csvList) {
if (csvRecord != null) {
csvContent.append(formatCsvRecord(csvRecord, csvSeparator));
}

// Add default line separator.
csvContent.append(DEFAULT_LINE_SEPARATOR);
}

return new ByteArrayInputStream(csvContent.toString().getBytes());
}

/**
* CSV record formatter. Convert a List of Objects representing the fields of a CSV record to a
* String representing the CSV record. The value of each Object will be obtained by its
* toString() method. The fields of the CSV record will be separated by the specified CSV field
* separator.
* @param csvRecord A List of Objects representing the fields of a CSV reecord.
* @param csvSeparator The CSV field separator to be used.
* @return A String representing a CSV record.
*/

private static extends Object> String formatCsvRecord(List csvRecord, char csvSeparator) {

// Prepare.
StringBuilder fields = new StringBuilder();
String separator = String.valueOf(csvSeparator);

// Process fields.
for (Iterator iter = csvRecord.iterator(); iter.hasNext();) {
T object = iter.next();

if (object != null) {
String field = object.toString().replace("\"", "\"\""); // Escape quotes.

if (field.contains(separator) || field.contains("\"")) {
field = "\"" + field + "\""; // Surround with quotes.
}

fields.append(field);
}

if (iter.hasNext()) {
fields.append(separator); // Add field separator.
}
}

return fields.toString();
}

}

Back to top


CSV formatting example


Here is an example how to create, format and save a CSV file quickly:



package mypackage;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import net.balusc.util.CsvUtil;
import net.balusc.util.FileUtil;

public class Test {

public static void main(String[] args) throws IOException {

// Create CSV.
List> csvList = new ArrayList>();
csvList.add(Arrays.asList("field1", "field2", "field3"));
csvList.add(Arrays.asList("field1,", "field2", "fie\"ld3"));
csvList.add(Arrays.asList("\"field1\"", ",field2,", ",\",\",\""));

// Format CSV.
InputStream csvInput = CsvUtil.formatCsv(csvList, ';');

// Save CSV.
FileUtil.write(new File("c:/test.csv"), csvInput);
}

}

You can find the FileUtil here by the way.


Back to top


Copyright - GNU Lesser General Public License


(C) June 2006, BalusC



Source:http://balusc.blogspot.com/2006/06/parse-csv-upload.html

Tidak ada komentar:

Posting Komentar