Difference between revisions of "CSV file reading and writing"
m (Corrected semantics of field and record.) |
(Added code for Excel work-arounds.) |
||
Line 86: | Line 86: | ||
Note that this code is vunerable to ''String too long'' errors if escaping | Note that this code is vunerable to ''String too long'' errors if escaping | ||
embedded characters result in the output string becoming too long. | embedded characters result in the output string becoming too long. | ||
+ | |||
+ | ==Creating files that Excel will read== | ||
+ | Microsoft Excel has several long-standing bugs where it mangles text field where | ||
+ | it thinks it looks like a value. It strips leading zeros from strings of digits, | ||
+ | it converts strings of digits with an 'e' character into an exponential value, | ||
+ | as do strings that look like a fraction, and anything that looks like a date | ||
+ | gets converted to an internal date value. | ||
+ | |||
+ | The following code will read and write CSV files if you need to deal with Excel | ||
+ | loading files containing records that Excel thinks look like numbers. Note that | ||
+ | even with fields protected in this manner, if you load the file into Excel | ||
+ | '''and then save it out again''' Excel will save the file with anything that | ||
+ | looks like a number converted into Excel's internal number format. There is | ||
+ | absolutely nothing that can be done to protect against this. | ||
+ | |||
+ | REM Read CSV file from in% to array$(rmax%,fmax%) | ||
+ | rmax%=0:REPEAT | ||
+ | s$=FNrd(in%) | ||
+ | FOR field%=0 TO fmax% | ||
+ | i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) | ||
+ | IF LEFT$(r$,2)="=""":r$=MID$(r$,2) | ||
+ | IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) | ||
+ | i%=0 | ||
+ | REPEAT | ||
+ | i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1) | ||
+ | UNTIL i%=0 | ||
+ | array$(rmax%,field%)=r$ | ||
+ | NEXT field% | ||
+ | rmax%=rmax%+1 | ||
+ | UNTIL EOF#in% | ||
+ | rmax%=rmax%-1 | ||
+ | |||
+ | REM Write CSV file from array$(rmax%,fmax%) to out% | ||
+ | FOR record%=0 TO rmax% | ||
+ | FOR field%=0 TO fmax% | ||
+ | r$=array$(record%,field%) | ||
+ | i%=0 | ||
+ | REPEAT | ||
+ | i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1 | ||
+ | UNTIL i%=0 | ||
+ | q%=INSTR(r$,",") | ||
+ | IF q%=0:q%=(ASCr$=48)AND(INSTR(r$,"/")=0) :REM leading zeros 00001 | ||
+ | IF q%=0:q%=LENSTR$VALr$>8 :REM long numbers 12345678901234 | ||
+ | IF q%=0:IFVALr$:q%=INSTR(r$,"/")AND(ASCr$<>48) :REM fractions 12/34 | ||
+ | IF q%=0:q%=MID$(r$,3,1)=" "ANDMID$(r$,7,1)=" " :REM dates xx XXX xxxx | ||
+ | IF q%:r$=""""+r$+"""":IFINSTR(r$,",")=0:r$="="+r$ | ||
+ | PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," | ||
+ | IF INSTR(r$,",")<>0 THEN r$=""""+r$+"""" | ||
+ | PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," | ||
+ | NEXT field% | ||
+ | NEXT record% | ||
[[User:Jgharston|Jgharston]] 20:20, 29 August 2008 (BST) | [[User:Jgharston|Jgharston]] 20:20, 29 August 2008 (BST) | ||
+ | [[User:Jgharston|Jgharston]] ([[User talk:Jgharston|talk]]) 00:18, 5 April 2017 (UTC) |
Revision as of 01:18, 5 April 2017
Comma Separated Value or CSV files are a common format typically used to represent the contents of a spreadsheet or database (or any other 2D matrix of data cells); they can by read and written by programs such as Microsoft Excel and Microsoft Access. The main properties of CSV files are as follows:
- Data rows are separated by end-of-line sequences.
- Data columns are separated by commas.
- Quotation marks must be escaped using the convention "".
- Data cells containing commas or quotes must be enclosed in quotes.
- Any data cell may optionally be enclosed in quotes.
Here is an example of the contents of a CSV file:
1997,Ford,E350,"AC, ABS, moon roof",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00
For more details see this Wikipedia article.
Records with no quote characters
If you know that no records contain " (quote) characters, or are willing to ignore them, the following code will read and write a CSV file.
REM Read CSV file from in% to array$(rmax%,fmax%) rmax%=0:REPEAT s$=FNrd(in%) FOR field%=0 TO fmax% i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) array$(rmax%,field%)=r$ NEXT record% rmax%=rmax%+1 UNTIL EOF#in% rmax%=rmax%-1 REM Write CSV file from array$(rmax%,fmax%) to out% FOR record%=0 TO rmax% FOR field%=0 TO fmax% s$=array$(record%,field%) IF INSTR(s$,",")<>0 THEN s$=""""+s$+"""" PROCwr(out%,s$):IF field%<>fmax%:BPUT#out%,ASC"," NEXT field% NEXT record%
This code uses fmax%
to hold the maximum field number
and rmax%
to hold the maximum record number, and
PROCwr()
to write a string to a file and
FNrd()
to read a terminated line from a file.
PROCwr()
and FNrd()
are in the
StringIO library.
Records containing quote characters
The following code will read and write CSV files if you need to deal with records containing " (quote) characters.
REM Read CSV file from in% to array$(rmax%,fmax%) rmax%=0:REPEAT s$=FNrd(in%) FOR field%=0 TO fmax% i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) i%=0 REPEAT i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1) UNTIL i%=0 array$(rmax%,field%)=r$ NEXT field% rmax%=rmax%+1 UNTIL EOF#in% rmax%=rmax%-1 REM Write CSV file from array$(rmax%,fmax%) to out% FOR record%=0 TO rmax% FOR field%=0 TO fmax% r$=array$(record%,field%) i%=0 REPEAT i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1 UNTIL i%=0 IF INSTR(r$,",")<>0 THEN r$=""""+r$+"""" PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," NEXT field% NEXT record%
Note that this code is vunerable to String too long errors if escaping embedded characters result in the output string becoming too long.
Creating files that Excel will read
Microsoft Excel has several long-standing bugs where it mangles text field where it thinks it looks like a value. It strips leading zeros from strings of digits, it converts strings of digits with an 'e' character into an exponential value, as do strings that look like a fraction, and anything that looks like a date gets converted to an internal date value.
The following code will read and write CSV files if you need to deal with Excel loading files containing records that Excel thinks look like numbers. Note that even with fields protected in this manner, if you load the file into Excel and then save it out again Excel will save the file with anything that looks like a number converted into Excel's internal number format. There is absolutely nothing that can be done to protect against this.
REM Read CSV file from in% to array$(rmax%,fmax%) rmax%=0:REPEAT s$=FNrd(in%) FOR field%=0 TO fmax% i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) IF LEFT$(r$,2)="=""":r$=MID$(r$,2) IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) i%=0 REPEAT i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1) UNTIL i%=0 array$(rmax%,field%)=r$ NEXT field% rmax%=rmax%+1 UNTIL EOF#in% rmax%=rmax%-1 REM Write CSV file from array$(rmax%,fmax%) to out% FOR record%=0 TO rmax% FOR field%=0 TO fmax% r$=array$(record%,field%) i%=0 REPEAT i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1 UNTIL i%=0 q%=INSTR(r$,",") IF q%=0:q%=(ASCr$=48)AND(INSTR(r$,"/")=0) :REM leading zeros 00001 IF q%=0:q%=LENSTR$VALr$>8 :REM long numbers 12345678901234 IF q%=0:IFVALr$:q%=INSTR(r$,"/")AND(ASCr$<>48) :REM fractions 12/34 IF q%=0:q%=MID$(r$,3,1)=" "ANDMID$(r$,7,1)=" " :REM dates xx XXX xxxx IF q%:r$=""""+r$+"""":IFINSTR(r$,",")=0:r$="="+r$ PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," IF INSTR(r$,",")<>0 THEN r$=""""+r$+"""" PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," NEXT field% NEXT record%
Jgharston 20:20, 29 August 2008 (BST) Jgharston (talk) 00:18, 5 April 2017 (UTC)