Difference between revisions of "CSV file reading and writing"
m (Tweeked semantics of rmax%, fmax%, corrected error.) |
m (Note about simplistic Excel solution.) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 20: | Line 20: | ||
==Records with no quote characters== | ==Records with no quote characters== | ||
− | If you know that no records contain ''"'' (quote) characters, or are willing to | + | If you know that no records contain ''"'' (quote) characters, or are willing |
− | ignore them, the following code will read and write a CSV file. | + | to ignore them, the following code will read and write a CSV file. |
− | REM Read CSV file from in% to array$( | + | REM Read CSV file from in% to array$(rmax%,fmax%) |
− | + | rmax%=0:REPEAT | |
s$=FNrd(in%) | s$=FNrd(in%) | ||
− | FOR | + | FOR field%=0 TO fmax% |
i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) | i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) | ||
IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) | IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) | ||
− | array$( | + | array$(rmax%,field%)=r$ |
NEXT record% | NEXT record% | ||
− | + | rmax%=rmax%+1 | |
UNTIL EOF#in% | UNTIL EOF#in% | ||
− | + | rmax%=rmax%-1 | |
− | REM Write CSV file from array$( | + | REM Write CSV file from array$(rmax%,fmax%) to out% |
− | FOR | + | FOR record%=0 TO rmax% |
− | FOR | + | FOR field%=0 TO fmax% |
− | s$=array$( | + | s$=array$(record%,field%) |
IF INSTR(s$,",")<>0 THEN s$=""""+s$+"""" | IF INSTR(s$,",")<>0 THEN s$=""""+s$+"""" | ||
− | PROCwr(out%,s$):IF | + | PROCwr(out%,s$):IF field%<>fmax%:BPUT#out%,ASC"," |
− | NEXT | + | NEXT field% |
− | NEXT | + | NEXT record% |
− | This code uses <code>''' | + | This code uses <code>'''fmax%'''</code> to hold the maximum field number |
and <code>'''rmax%'''</code> to hold the maximum record number, and | and <code>'''rmax%'''</code> to hold the maximum record number, and | ||
<code>'''PROCwr()'''</code> to write a string to a file and | <code>'''PROCwr()'''</code> to write a string to a file and | ||
Line 86: | Line 86: | ||
Note that this code is vunerable to ''String too long'' errors if escaping | Note that this code is vunerable to ''String too long'' errors if escaping | ||
embedded characters result in the output string becoming too long. | embedded characters result in the output string becoming too long. | ||
+ | |||
+ | ==Creating files that Excel will read== | ||
+ | Microsoft Excel has several long-standing bugs where it mangles text fields | ||
+ | where it thinks it looks like a numeric value. It strips leading zeros from | ||
+ | strings of digits, it converts strings of digits with an 'e' character into | ||
+ | an exponential value, and also strings that look like a fraction, and | ||
+ | anything that looks like a date gets converted to an internal date value. | ||
+ | |||
+ | The simplest thing to do is output all data surrounded with <code>'''=""'''</code>, | ||
+ | for example <code>'''="018118055"'''</code>. However, this significantly increases | ||
+ | the size of the output file. | ||
+ | |||
+ | The following code will read and write CSV files if you need to deal with | ||
+ | Excel loading files containing records that Excel thinks look like numbers, | ||
+ | protecting vulnerable data to output the most compact output file. Note that | ||
+ | even with fields protected in this manner, if you load the file into Excel | ||
+ | '''and then save it out again''' Excel will save the file with anything that | ||
+ | looks like a number converted into Excel's internal number format. There is | ||
+ | absolutely nothing that can be done to protect against this. | ||
+ | |||
+ | REM Read CSV file from in% to array$(rmax%,fmax%) | ||
+ | rmax%=0:REPEAT | ||
+ | s$=FNrd(in%) | ||
+ | FOR field%=0 TO fmax% | ||
+ | i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) | ||
+ | IF LEFT$(r$,2)="=""":r$=MID$(r$,2) | ||
+ | IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) | ||
+ | i%=0 | ||
+ | REPEAT | ||
+ | i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1) | ||
+ | UNTIL i%=0 | ||
+ | array$(rmax%,field%)=r$ | ||
+ | NEXT field% | ||
+ | rmax%=rmax%+1 | ||
+ | UNTIL EOF#in% | ||
+ | rmax%=rmax%-1 | ||
+ | |||
+ | REM Write CSV file from array$(rmax%,fmax%) to out% | ||
+ | FOR record%=0 TO rmax% | ||
+ | FOR field%=0 TO fmax% | ||
+ | r$=array$(record%,field%) | ||
+ | i%=0 | ||
+ | REPEAT | ||
+ | i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1 | ||
+ | UNTIL i%=0 | ||
+ | q%=INSTR(r$,",") | ||
+ | IF q%=0:q%=(ASCr$=48)AND(INSTR(r$,"/")=0) :REM leading zeros 00001 | ||
+ | IF q%=0:q%=LENSTR$VALr$>8 :REM long numbers 12345678901234 | ||
+ | IF q%=0:IFVALr$:q%=INSTR(r$,"/")AND(ASCr$<>48) :REM fractions 12/34 | ||
+ | IF q%=0:q%=MID$(r$,3,1)=" "ANDMID$(r$,7,1)=" " :REM dates xx XXX xxxx | ||
+ | IF q%:r$=""""+r$+"""":IFINSTR(r$,",")=0:r$="="+r$ | ||
+ | PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," | ||
+ | IF INSTR(r$,",")<>0 THEN r$=""""+r$+"""" | ||
+ | PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," | ||
+ | NEXT field% | ||
+ | NEXT record% | ||
[[User:Jgharston|Jgharston]] 20:20, 29 August 2008 (BST) | [[User:Jgharston|Jgharston]] 20:20, 29 August 2008 (BST) | ||
+ | [[User:Jgharston|Jgharston]] ([[User talk:Jgharston|talk]]) 00:18, 5 April 2017 (UTC) |
Latest revision as of 16:58, 12 November 2017
Comma Separated Value or CSV files are a common format typically used to represent the contents of a spreadsheet or database (or any other 2D matrix of data cells); they can by read and written by programs such as Microsoft Excel and Microsoft Access. The main properties of CSV files are as follows:
- Data rows are separated by end-of-line sequences.
- Data columns are separated by commas.
- Quotation marks must be escaped using the convention "".
- Data cells containing commas or quotes must be enclosed in quotes.
- Any data cell may optionally be enclosed in quotes.
Here is an example of the contents of a CSV file:
1997,Ford,E350,"AC, ABS, moon roof",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00
For more details see this Wikipedia article.
Records with no quote characters
If you know that no records contain " (quote) characters, or are willing to ignore them, the following code will read and write a CSV file.
REM Read CSV file from in% to array$(rmax%,fmax%) rmax%=0:REPEAT s$=FNrd(in%) FOR field%=0 TO fmax% i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) array$(rmax%,field%)=r$ NEXT record% rmax%=rmax%+1 UNTIL EOF#in% rmax%=rmax%-1 REM Write CSV file from array$(rmax%,fmax%) to out% FOR record%=0 TO rmax% FOR field%=0 TO fmax% s$=array$(record%,field%) IF INSTR(s$,",")<>0 THEN s$=""""+s$+"""" PROCwr(out%,s$):IF field%<>fmax%:BPUT#out%,ASC"," NEXT field% NEXT record%
This code uses fmax%
to hold the maximum field number
and rmax%
to hold the maximum record number, and
PROCwr()
to write a string to a file and
FNrd()
to read a terminated line from a file.
PROCwr()
and FNrd()
are in the
StringIO library.
Records containing quote characters
The following code will read and write CSV files if you need to deal with records containing " (quote) characters.
REM Read CSV file from in% to array$(rmax%,fmax%) rmax%=0:REPEAT s$=FNrd(in%) FOR field%=0 TO fmax% i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) i%=0 REPEAT i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1) UNTIL i%=0 array$(rmax%,field%)=r$ NEXT field% rmax%=rmax%+1 UNTIL EOF#in% rmax%=rmax%-1 REM Write CSV file from array$(rmax%,fmax%) to out% FOR record%=0 TO rmax% FOR field%=0 TO fmax% r$=array$(record%,field%) i%=0 REPEAT i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1 UNTIL i%=0 IF INSTR(r$,",")<>0 THEN r$=""""+r$+"""" PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," NEXT field% NEXT record%
Note that this code is vunerable to String too long errors if escaping embedded characters result in the output string becoming too long.
Creating files that Excel will read
Microsoft Excel has several long-standing bugs where it mangles text fields where it thinks it looks like a numeric value. It strips leading zeros from strings of digits, it converts strings of digits with an 'e' character into an exponential value, and also strings that look like a fraction, and anything that looks like a date gets converted to an internal date value.
The simplest thing to do is output all data surrounded with =""
,
for example ="018118055"
. However, this significantly increases
the size of the output file.
The following code will read and write CSV files if you need to deal with Excel loading files containing records that Excel thinks look like numbers, protecting vulnerable data to output the most compact output file. Note that even with fields protected in this manner, if you load the file into Excel and then save it out again Excel will save the file with anything that looks like a number converted into Excel's internal number format. There is absolutely nothing that can be done to protect against this.
REM Read CSV file from in% to array$(rmax%,fmax%) rmax%=0:REPEAT s$=FNrd(in%) FOR field%=0 TO fmax% i%=INSTR(s$+",",","):r$=LEFT$(s$,i%-1):s$=MID$(s$,i%+1) IF LEFT$(r$,2)="=""":r$=MID$(r$,2) IF LEFT$(r$,1)="""":IF RIGHT$(r$,1)="""":r$=MID$(r$,2,LENr$-2) i%=0 REPEAT i%=INSTR(r$,"""""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%+1) UNTIL i%=0 array$(rmax%,field%)=r$ NEXT field% rmax%=rmax%+1 UNTIL EOF#in% rmax%=rmax%-1 REM Write CSV file from array$(rmax%,fmax%) to out% FOR record%=0 TO rmax% FOR field%=0 TO fmax% r$=array$(record%,field%) i%=0 REPEAT i%=INSTR(r$,"""",i%+1):IF i%:r$=LEFT$(r$,i%)+MID$(r$,i%):i%=i%+1 UNTIL i%=0 q%=INSTR(r$,",") IF q%=0:q%=(ASCr$=48)AND(INSTR(r$,"/")=0) :REM leading zeros 00001 IF q%=0:q%=LENSTR$VALr$>8 :REM long numbers 12345678901234 IF q%=0:IFVALr$:q%=INSTR(r$,"/")AND(ASCr$<>48) :REM fractions 12/34 IF q%=0:q%=MID$(r$,3,1)=" "ANDMID$(r$,7,1)=" " :REM dates xx XXX xxxx IF q%:r$=""""+r$+"""":IFINSTR(r$,",")=0:r$="="+r$ PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," IF INSTR(r$,",")<>0 THEN r$=""""+r$+"""" PROCwr(out%,r$):IF field%<>fmax%:BPUT#out%,ASC"," NEXT field% NEXT record%
Jgharston 20:20, 29 August 2008 (BST) Jgharston (talk) 00:18, 5 April 2017 (UTC)