Site Map - skip to main content - dyslexic font - mobile - text - print

Hobby Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr2698 :: XSV for fast CSV manipulations - Part 1

Written in Rust, xsv is my new favorite tool for manipulating csv files

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on 2018-12-05 is flagged as Clean and is released under a CC-BY-SA license.
Tags: CSV,XSV.
Listen in ogg, spx, or mp3 format. | Comments (1)

XSV for fast CSV manipulations - Part 1: Basic Usage

https://github.com/BurntSushi/xsv

Introduction

xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable:

  1. Simple tasks should be easy.
  2. Performance trade offs should be exposed in the CLI interface.
  3. Composition should not come at the expense of performance.

We will be using the CSV file provided in the documentation.

Commands covered in this episode

  • count - Count the rows of CSV data
  • headers - Show the headers of CSV data, or show the intersection of all headers between many CSV files
  • index - Create an index for a CSV file. This is very quick and provides constant time indexing into the CSV file.
  • frequency - Build frequency tables of each column in CSV data.
  • stats - Show basic types and statistics of each column in the CSV file. (i.e., mean, standard deviation, median, range, etc.)
  • sort - Sort CSV data
  • select - Select or re-order columns from CSV data.
  • slice - Slice rows from any part of a CSV file. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
  • search - Run a regex over CSV data. Applies the regex to each field individually and shows only matching rows.
  • table - Show aligned output of any CSV data using elastic tabstops.
  • flatten - A flattened view of CSV records. Useful for viewing one record at a time.

Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2018-12-05T14:58:29Z by Mike Ray

Good timing

What a brilliant tool and a great show.

This has come at a good time for me as I am deep into a large screen-scraping project which is yielding complex CSV files with many columns.

Like b-yeezi I frequently get involved with textual data manipulation in all kinds of formats. I did not know about xsv and have often had to guess the ordinal position of specific columns, and have to do all kinds of slicing and dicing operations.

Not easy at the best of times, and time consuming. All the more so if you can't easily guess the column position because you can't see.

So the timing of this show is great for me. And this is real hacking.

<< First, < Previous, Latest >>

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?