In this episode, I give some examples of common and uncommon tools for processing data files
Hosted by b-yeezi on 2016-08-08 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg,
mp3 format. | Comments (4)
Here are some of the tools I use to process and clean data from all manner of customers:
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.
See other episodes for great sed information. I like to remove DOS end of line and end of file characters:
sed -i 's/
sed -i 's/\r//g' *.txt
- pdftotext -layout
- unix2dos and dos2unix
- buffer searches (
:vim /pattern/ ##)
- Ack plugin
- bufdo (
:bufdo %s/pattern/replace/ge | update)
Comment #1 posted on 2016-08-09T00:46:44Z by Jonathan Kulp
Thanks this is a genius tool. Never heard of it before.
Comment #2 posted on 2016-08-17T16:55:35Z by Ken Fallon
I love detox
detox -vr *
wow what an excellent tool.
Comment #3 posted on 2016-08-19T16:30:03Z by Dave Morriss
Thanks for mentioning 'ack'
Wow! I had never encountered 'ack' before. It's amazing.
I have written a bunch of Bash scripts to work with a PostgreSQL database (yes, I know, it's a bit like wearing a hair shirt; self mortification), and I found I could do things like:
ack --shell --pager=more psql .
There's no other easy way to do this that I know of.
Thanks very much for pointing this one out.
Comment #4 posted on 2016-08-21T14:53:50Z by ivor
I always love vim tips. So I got pulled in looking at the buffer search. Then I noticed the other tools mentioned. Most of them I know about and use all that are relevant to me very frequently. So now I'm going to subscribe...
<< First, < Previous, Next >, Latest >>
Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.
Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).