Difference between revisions of "CSV file reading and writing"

From BeebWiki
Jump to: navigation, search
(Added code for Excel work-arounds.)
m (Note about simplistic Excel solution.)
 
Line 20: Line 20:
  
 
==Records with no quote characters==
 
==Records with no quote characters==
If you know that no records contain ''"'' (quote) characters, or are willing to
+
If you know that no records contain ''"'' (quote) characters, or are willing
ignore them, the following code will read and write a CSV file.
+
to ignore them, the following code will read and write a CSV file.
  
 
     REM Read CSV file from in% to array$(rmax%,fmax%)
 
     REM Read CSV file from in% to array$(rmax%,fmax%)
Line 88: Line 88:
  
 
==Creating files that Excel will read==
 
==Creating files that Excel will read==
Microsoft Excel has several long-standing bugs where it mangles text field where
+
Microsoft Excel has several long-standing bugs where it mangles text fields
it thinks it looks like a value. It strips leading zeros from strings of digits,
+
where it thinks it looks like a numeric value. It strips leading zeros from
it converts strings of digits with an 'e' character into an exponential value,
+
strings of digits, it converts strings of digits with an 'e' character into
as do strings that look like a fraction, and anything that looks like a date
+
an exponential value, and also strings that look like a fraction, and
gets converted to an internal date value.
+
anything that looks like a date gets converted to an internal date value.
  
The following code will read and write CSV files if you need to deal with Excel
+
The simplest thing to do is output all data surrounded with <code>'''=""'''</code>,
loading files containing records that Excel thinks look like numbers. Note that
+
for example <code>'''="018118055"'''</code>. However, this significantly increases
 +
the size of the output file.
 +
 
 +
The following code will read and write CSV files if you need to deal with
 +
Excel loading files containing records that Excel thinks look like numbers,
 +
protecting vulnerable data to output the most compact output file. Note that
 
even with fields protected in this manner, if you load the file into Excel
 
even with fields protected in this manner, if you load the file into Excel
 
'''and then save it out again''' Excel will save the file with anything that
 
'''and then save it out again''' Excel will save the file with anything that

Latest revision as of 16:58, 12 November 2017

Comma Separated Value or CSV files are a common format typically used to represent the contents of a spreadsheet or database (or any other 2D matrix of data cells); they can by read and written by programs such as Microsoft Excel and Microsoft Access. The main properties of CSV files are as follows:

  • Data rows are separated by end-of-line sequences.
  • Data columns are separated by commas.
  • Quotation marks must be escaped using the convention "".
  • Data cells containing commas or quotes must be enclosed in quotes.
  • Any data cell may optionally be enclosed in quotes.

Here is an example of the contents of a CSV file:

   1997,Ford,E350,"AC, ABS, moon roof",3000.00
   1999,Chevy,"Venture ""Extended Edition""",,4900.00

For more details see this Wikipedia article.

Records with no quote characters

If you know that no records contain " (quote) characters, or are willing to ignore them, the following code will read and write a CSV file.

   REM Read CSV file from in% to array$(rmax%,fmax%)
   rmax%=0:REPEAT
     s$=FNrd(in%)
     FOR field%=0 TO fmax%
       i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1)
       IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2)
       array$(rmax%,field%)=r$
     NEXT record%
     rmax%=rmax%+1
   UNTIL EOF#in%
   rmax%=rmax%-1
   
   REM Write CSV file from array$(rmax%,fmax%) to out%
   FOR record%=0 TO rmax%
     FOR field%=0 TO fmax%
       s$=array$(record%,field%)
       IF INSTR(s$,",")<>0 THEN s$=""""+s$+""""
       PROCwr(out%,s$):IF field%<>fmax%:BPUT#out%,ASC","
     NEXT field%
   NEXT record%

This code uses fmax% to hold the maximum field number and rmax% to hold the maximum record number, and PROCwr() to write a string to a file and FNrd() to read a terminated line from a file. PROCwr() and FNrd() are in the StringIO library.

Records containing quote characters

The following code will read and write CSV files if you need to deal with records containing " (quote) characters.

   REM Read CSV file from in% to array$(rmax%,fmax%)
   rmax%=0:REPEAT
     s$=FNrd(in%)
     FOR field%=0 TO fmax%
       i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1)
       IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2)
       i%=0
       REPEAT
         i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1)
       UNTIL i%=0
       array$(rmax%,field%)=r$
     NEXT field%
     rmax%=rmax%+1
   UNTIL EOF#in%
   rmax%=rmax%-1
   
   REM Write CSV file from array$(rmax%,fmax%) to out%
   FOR record%=0 TO rmax%
     FOR field%=0 TO fmax%
       r$=array$(record%,field%)
       i%=0
       REPEAT
         i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1
       UNTIL i%=0
       IF INSTR(r$,",")<>0 THEN r$=""""+r$+""""
       PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC","
     NEXT field%
   NEXT record%

Note that this code is vunerable to String too long errors if escaping embedded characters result in the output string becoming too long.

Creating files that Excel will read

Microsoft Excel has several long-standing bugs where it mangles text fields where it thinks it looks like a numeric value. It strips leading zeros from strings of digits, it converts strings of digits with an 'e' character into an exponential value, and also strings that look like a fraction, and anything that looks like a date gets converted to an internal date value.

The simplest thing to do is output all data surrounded with ="", for example ="018118055". However, this significantly increases the size of the output file.

The following code will read and write CSV files if you need to deal with Excel loading files containing records that Excel thinks look like numbers, protecting vulnerable data to output the most compact output file. Note that even with fields protected in this manner, if you load the file into Excel and then save it out again Excel will save the file with anything that looks like a number converted into Excel's internal number format. There is absolutely nothing that can be done to protect against this.

   REM Read CSV file from in% to array$(rmax%,fmax%)
   rmax%=0:REPEAT
     s$=FNrd(in%)
     FOR field%=0 TO fmax%
       i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1)
       IF LEFT$(r$,2)="=""":r$=MID$(r$,2)
       IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2)
       i%=0
       REPEAT
         i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1)
       UNTIL i%=0
       array$(rmax%,field%)=r$
     NEXT field%
     rmax%=rmax%+1
   UNTIL EOF#in%
   rmax%=rmax%-1
   
   REM Write CSV file from array$(rmax%,fmax%) to out%
   FOR record%=0 TO rmax%
     FOR field%=0 TO fmax%
       r$=array$(record%,field%)
       i%=0
       REPEAT
         i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1
       UNTIL i%=0
       q%=INSTR(r$,",")
       IF q%=0:q%=(ASCr$=48)AND(INSTR(r$,"/")=0)       :REM leading zeros 00001
       IF q%=0:q%=LENSTR$VALr$>8                       :REM long numbers 12345678901234
       IF q%=0:IFVALr$:q%=INSTR(r$,"/")AND(ASCr$<>48)  :REM fractions 12/34
       IF q%=0:q%=MID$(r$,3,1)=" "ANDMID$(r$,7,1)=" "  :REM dates xx XXX xxxx
       IF q%:r$=""""+r$+"""":IFINSTR(r$,",")=0:r$="="+r$
       PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC","
       IF INSTR(r$,",")<>0 THEN r$=""""+r$+""""
       PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC","
     NEXT field%
   NEXT record%

Jgharston 20:20, 29 August 2008 (BST) Jgharston (talk) 00:18, 5 April 2017 (UTC)