Friday, September 16, 2005

Data Transformation Adventure

neopoleon.com
If you are lucky, you will find the contorted logic in this comic strip totally alien.

If you are like me, it would be way too familiar.

For those of you who are not sure how to store your marketing (or any type of) data: Enter it in one sheet of Excel and export it as tab-delimited text or as the csv format. This will be the simplest and easiest to work with computers. Make sure there is no empty row or column. Make sure the column headers are descriptive. Note that Excel can only store up to about 65,000 rows of data.

If your data is too complex for a sheet of Excel, you might want to consider using some kind of a desktop database such as Access. It's not as simple as just typing in Excel but still pretty simple.

If you have a large amount of data and many people will access it frequently, it's time to use a reasonably nice database such as Postgresql, MySQL, Microsoft SQL Server, Oracle, IBM DB2, etc. The choice depends on your budget and the skill of the person(s) who will help you. Postgresql and MySQL are open source software so they don't cost you any money. The others cost some money but you might be able to piggyback your data onto your company's existing databases.

I remember asking my database administrator friend 15 years ago: Why don't we store everything as simple text file? Turns out that simple text files are quite good for transferring the data between applications but not so good for fast manipulation of data (such as ranking, grouping, summing, etc.) So, if you want to manipulate your data a lot, use a real database (Postgresql and friends...), if you want to transfer the information between different applications, export it from the database into some text format. Tab-delimited text is quite serviceable; for more fancy stuff, you can use XML which is a very fancy text file.

No comments: