CSV file reading and writing

From BeebWiki
Jump to: navigation, search

Comma Separated Value or CSV files are a common format typically used to represent the contents of a spreadsheet or database (or any other 2D matrix of data cells); they can by read and written by programs such as Microsoft Excel and Microsoft Access. The main properties of CSV files are as follows:

  • Data rows are separated by end-of-line sequences.
  • Data columns are separated by commas.
  • Quotation marks must be escaped using the convention "".
  • Data cells containing commas or quotes must be enclosed in quotes.
  • Any data cell may optionally be enclosed in quotes.

Here is an example of the contents of a CSV file:

   1997,Ford,E350,"AC, ABS, moon roof",3000.00
   1999,Chevy,"Venture ""Extended Edition""",,4900.00

For more details see this Wikipedia article.

Records with no quote characters

If you know that no records contain " (quote) characters, or are willing to ignore them, the following code will read and write a CSV file.

   REM Read CSV file from in% to array$(rmax%,fmax%)
   rmax%=0:REPEAT
     s$=FNrd(in%)
     FOR field%=0 TO fmax%
       i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1)
       IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2)
       array$(rmax%,field%)=r$
     NEXT record%
     rmax%=rmax%+1
   UNTIL EOF#in%
   rmax%=rmax%-1
   
   REM Write CSV file from array$(rmax%,fmax%) to out%
   FOR record%=0 TO rmax%
     FOR field%=0 TO fmax%
       s$=array$(record%,field%)
       IF INSTR(s$,",")<>0 THEN s$=""""+s$+""""
       PROCwr(out%,s$):IF field%<>fmax%:BPUT#out%,ASC","
     NEXT field%
   NEXT record%

This code uses fmax% to hold the maximum field number and rmax% to hold the maximum record number, and PROCwr() to write a string to a file and FNrd() to read a terminated line from a file. PROCwr() and FNrd() are in the StringIO library.

Records containing quote characters

The following code will read and write CSV files if you need to deal with records containing " (quote) characters.

   REM Read CSV file from in% to array$(rmax%,fmax%)
   rmax%=0:REPEAT
     s$=FNrd(in%)
     FOR field%=0 TO fmax%
       i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1)
       IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2)
       i%=0
       REPEAT
         i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1)
       UNTIL i%=0
       array$(rmax%,field%)=r$
     NEXT field%
     rmax%=rmax%+1
   UNTIL EOF#in%
   rmax%=rmax%-1
   
   REM Write CSV file from array$(rmax%,fmax%) to out%
   FOR record%=0 TO rmax%
     FOR field%=0 TO fmax%
       r$=array$(record%,field%)
       i%=0
       REPEAT
         i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1
       UNTIL i%=0
       IF INSTR(r$,",")<>0 THEN r$=""""+r$+""""
       PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC","
     NEXT field%
   NEXT record%

Note that this code is vunerable to String too long errors if escaping embedded characters result in the output string becoming too long.

Creating files that Excel will read

Microsoft Excel has several long-standing bugs where it mangles text fields where it thinks it looks like a numeric value. It strips leading zeros from strings of digits, it converts strings of digits with an 'e' character into an exponential value, and also strings that look like a fraction, and anything that looks like a date gets converted to an internal date value.

The simplest thing to do is output all data surrounded with ="", for example ="018118055". However, this significantly increases the size of the output file.

The following code will read and write CSV files if you need to deal with Excel loading files containing records that Excel thinks look like numbers, protecting vulnerable data to output the most compact output file. Note that even with fields protected in this manner, if you load the file into Excel and then save it out again Excel will save the file with anything that looks like a number converted into Excel's internal number format. There is absolutely nothing that can be done to protect against this.

   REM Read CSV file from in% to array$(rmax%,fmax%)
   rmax%=0:REPEAT
     s$=FNrd(in%)
     FOR field%=0 TO fmax%
       i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1)
       IF LEFT$(r$,2)="=""":r$=MID$(r$,2)
       IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2)
       i%=0
       REPEAT
         i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1)
       UNTIL i%=0
       array$(rmax%,field%)=r$
     NEXT field%
     rmax%=rmax%+1
   UNTIL EOF#in%
   rmax%=rmax%-1
   
   REM Write CSV file from array$(rmax%,fmax%) to out%
   FOR record%=0 TO rmax%
     FOR field%=0 TO fmax%
       r$=array$(record%,field%)
       i%=0
       REPEAT
         i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1
       UNTIL i%=0
       q%=INSTR(r$,",")
       IF q%=0:q%=(ASCr$=48)AND(INSTR(r$,"/")=0)       :REM leading zeros 00001
       IF q%=0:q%=LENSTR$VALr$>8                       :REM long numbers 12345678901234
       IF q%=0:IFVALr$:q%=INSTR(r$,"/")AND(ASCr$<>48)  :REM fractions 12/34
       IF q%=0:q%=MID$(r$,3,1)=" "ANDMID$(r$,7,1)=" "  :REM dates xx XXX xxxx
       IF q%:r$=""""+r$+"""":IFINSTR(r$,",")=0:r$="="+r$
       PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC","
       IF INSTR(r$,",")<>0 THEN r$=""""+r$+""""
       PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC","
     NEXT field%
   NEXT record%

Jgharston 20:20, 29 August 2008 (BST) Jgharston (talk) 00:18, 5 April 2017 (UTC)