<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Living in CSV Hell</title>
	<atom:link href="http://www.monkeydust.net/2008/04/17/living-in-csv-hell/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.monkeydust.net/2008/04/17/living-in-csv-hell/</link>
	<description>ramblings from the techie side of life</description>
	<pubDate>Sat, 22 Nov 2008 03:41:45 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: Nick</title>
		<link>http://www.monkeydust.net/2008/04/17/living-in-csv-hell/#comment-11</link>
		<dc:creator>Nick</dc:creator>
		<pubDate>Fri, 18 Apr 2008 14:13:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.monkeydust.net/?p=15#comment-11</guid>
		<description>Yeah, i had thought of perl, but sadly my knowledge of perl is pretty much zero.

With regards to using mysql, i was tempted here too as it would make it *really* easy to dump into a database and then allow me to do all the work to it i wanted, but after thinking it through some more ive come up with what i hope is a simpler solution that doesnt even care that its a CSV file.

As all i really want to do is change all the column titles and remove some blank lines at the top im just using good old fopen(). Its not elegant but importantly its something i can throw together quickly and it still saves me time rather than doing it by hand.
Ive also been able to throw in some basic checking, as im not concerned about whats actually in each cell at this point as im just checking for erroneous characters (sometimes we get random question marks or other bits appear that throws off the target database). Normally this would all be checked by hand and only take a minute, however for a few weeks a year we get thousands of applications to check, so even a simple script like this would allow us to automatically do at least some basic checking and flag records to check later by hand, hopefully saving us a lot of time.

Dont worry though Brian, im not about to try and run this as a shell script, its all going into a basic webpage on our intranet site so files can be uploaded and then download back when their fixed along with a page listing any suspect entries.

Thanks for both your ideas, thankfully a nights sleep allowed me to regain enough of my sanity to come up with a simple(r) solution that doesnt tax my frankly limited programming skills!</description>
		<content:encoded><![CDATA[<p>Yeah, i had thought of perl, but sadly my knowledge of perl is pretty much zero.</p>
<p>With regards to using mysql, i was tempted here too as it would make it *really* easy to dump into a database and then allow me to do all the work to it i wanted, but after thinking it through some more ive come up with what i hope is a simpler solution that doesnt even care that its a CSV file.</p>
<p>As all i really want to do is change all the column titles and remove some blank lines at the top im just using good old fopen(). Its not elegant but importantly its something i can throw together quickly and it still saves me time rather than doing it by hand.<br />
Ive also been able to throw in some basic checking, as im not concerned about whats actually in each cell at this point as im just checking for erroneous characters (sometimes we get random question marks or other bits appear that throws off the target database). Normally this would all be checked by hand and only take a minute, however for a few weeks a year we get thousands of applications to check, so even a simple script like this would allow us to automatically do at least some basic checking and flag records to check later by hand, hopefully saving us a lot of time.</p>
<p>Dont worry though Brian, im not about to try and run this as a shell script, its all going into a basic webpage on our intranet site so files can be uploaded and then download back when their fixed along with a page listing any suspect entries.</p>
<p>Thanks for both your ideas, thankfully a nights sleep allowed me to regain enough of my sanity to come up with a simple(r) solution that doesnt tax my frankly limited programming skills!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian</title>
		<link>http://www.monkeydust.net/2008/04/17/living-in-csv-hell/#comment-10</link>
		<dc:creator>Brian</dc:creator>
		<pubDate>Thu, 17 Apr 2008 23:56:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.monkeydust.net/?p=15#comment-10</guid>
		<description>Many databases can read delimited data files directly.  mysql for example can do it via LOAD DATA INFILE.  I'd be surprised to find a database that can't do it.

Ruby also has a couple nice CSV libraries.  I've used them a lot.   PHP isn't really a shell scripting kind of language.

Parsing CSV files isn't quite trivial, so resist writing your own parser if you can.  Everyone's first gut reaction is to use a regular expression, but you have to handle commas inside quoted strings and escaped quotes inside quoted strings etc.</description>
		<content:encoded><![CDATA[<p>Many databases can read delimited data files directly.  mysql for example can do it via LOAD DATA INFILE.  I&#8217;d be surprised to find a database that can&#8217;t do it.</p>
<p>Ruby also has a couple nice CSV libraries.  I&#8217;ve used them a lot.   PHP isn&#8217;t really a shell scripting kind of language.</p>
<p>Parsing CSV files isn&#8217;t quite trivial, so resist writing your own parser if you can.  Everyone&#8217;s first gut reaction is to use a regular expression, but you have to handle commas inside quoted strings and escaped quotes inside quoted strings etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leifbk</title>
		<link>http://www.monkeydust.net/2008/04/17/living-in-csv-hell/#comment-9</link>
		<dc:creator>Leifbk</dc:creator>
		<pubDate>Thu, 17 Apr 2008 16:25:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.monkeydust.net/?p=15#comment-9</guid>
		<description>That's the kind of problem space that Perl was designed for, and in which it really shines. There's even a special CSV module that takes care of most of the common problems in that area.

Even if I do a lot of coding in PHP, I'll switch to Perl anytime I need to do some real data munging.

regards, Leif</description>
		<content:encoded><![CDATA[<p>That&#8217;s the kind of problem space that Perl was designed for, and in which it really shines. There&#8217;s even a special CSV module that takes care of most of the common problems in that area.</p>
<p>Even if I do a lot of coding in PHP, I&#8217;ll switch to Perl anytime I need to do some real data munging.</p>
<p>regards, Leif</p>
]]></content:encoded>
	</item>
</channel>
</rss>
