Showing headlines posted by Bob_Mesibov
« Previous ( 1 ... 4 5 6 7 8 9 ... 10 ) Next »How to find distances between lat/lons for geochecking
Finding incorrect latitude/longitude figures in a big data table can be difficult, but checking can be made easier with an AWK calculation.
Mapping with gnuplot
How to use gnuplot to put data points on a basemap.
Repair job: separate the tandem repeats
By mistake, I put two identical data items in the same field as a tandem repeat. Here's how either sed or AWK could be used to split the repeat and put its parts into two different fields.
Bird watching with AWK and grep
A "fileA-in-fileB" search is a search of fileB for lines that match any of the lines in fileA. I thought I'd post what happens when you systematically vary a fileA-in-fileB search. TL;DR: AWK wins.
How to enter nothing in a database
I call them NITS, which is short for Nothing Interesting To Say. They're the filler items that appear in spreadsheets and databases when the person entering the data has no information for a particular field.
How to validate ISO 8601 dates without regex
How to check for format and content errors in YYYY-MM-DD fields with AWK.
Fightin' fields
Disagreements between fields in a database can be tricky to diagnose and even harder to detect.
Fuzzy matching in practice
Approximate or "fuzzy" matching on the command line is easily done with tre-agrep. Here's a practical example.
Data on clay
If you were wandering the streets of a busy city in the Fertile Crescent a few thousand years ago, you might have run across someone jotting down a few notes on a small clay tablet. The jotting-down was done with the cut stem of a reed, and the result is today called cuneiform writing, after the wedge shape of some of the written elements (Latin cuneus, "wedge"). Cuneiform writing on clay was around for at least 3000 years and was adopted for use in a range of languages.
iconv and illegal input sequences
Character encoding is a big topic and this post isn't going to cover it all. In fact, I'm only going to deal with one particular problem — what to do when you're converting a file with the iconv utility and you get a error message .......How to get around the "illegal input sequence" roadblock.
Displaying data from table fragments
This post was inspired by a presentation I watched at a recent conference. The presenter had collected a large number of data tables, each of which had a selection of fields from a large pool of fields... How to neatly display which fields were in which tables?
A record pager built with YAD
How to turn a YAD dialog into a GUI viewer/pager for records in a data table.
48 sea levels and a trope for your terminal
A bulk string replacement with AWK, and that ACCESS DENIED thing. The messiest dataset I've ever audited included a field that illustrates a key rule of Fussy Database Management: Never let users enter free-text data in a field unless absolutely necessary.
Mojibake detective work
A close look at some character encoding problems.
BASHing data
It's easy to find records in a data table that are entirely blank or only contain tabs or whitespace. Just ask AWK to print the line numbers of any records that have no fields
GUI ways to view and edit big text files
glogg, gvim, Geany and csvpad, but not spreadsheets.
Question marks that aren't really question marks
Some question marks are signs that a program is baffled by a character's encoding.
Curse of the CSV monster
How to convert a CSV to a TSV and complain about CSVs at the same time
Truncated data items
Some command-line tricks for detecting truncations, such as a 100-character string clipped to 50 characters in a database.
Combo characters
How to deal with Unicode's combining characters on the command line.