================================================================
BUGFIXES

! $ cat a
  x=2
  a=3

  $ mlr join -j a -f a a
  mlr: internal coding error: failed transition from prefill state.

* mlr_logistic_regression: Newton-Raphson convergence failed after 101 iterations. m=nan, b=nan.
  happens when data are too *good*.

================================================================
TOP OF LIST

* make a 2.3.2
  - filter -x
  - logireg w/ caveat
  - stats2 hold-and-fit
  - improved heterogeneity for sort,stats1,stats2,step,head,tail,top,sample,uniq,count-distinct
  - implicit-csv-header feature

! join on partial value-field matches

MINOR: star-put fcn (for cheesy bar-charts) w/ min/max
  mlr bar -c '*' -l 10 -h 20 -w 80 -f x,y
    -c: default '*'
    -x: default '#'
    -l: default 0
    -h: default 100?
    -w: default 80?
    use '#' at unders/overs ...
  keep an array mapping to **'s for length up to 100; populate on alloc.
  note in help, best w/ --opprint or --oxtab

MINOR: popall and/or xfer method; use @ sort; replace append @ streamer & where else?

MINOR: mlrcli topdown
MINOR: brew version bump?!?

MINOR: hold-and-fit regressor doc: 'then put' for residuals; note avoids two-pass & the saving of fit parameters
MINOR: faqent re histo w/ min/max is effectively 2-pass (unless you have prior knowledge about the data).
  note count-distinct w/ int() func
MINOR: faqent on two-pass lin/logi reg
MINOR: faqent on xml/json/etc: generally no to recursive data structures; punctuation re-writes w/ well-formatted;
  in-language handling in case not field/line formatted. see h.p. mail.
MINOR: faqent on polyglottal dkvp/etc production
MINOR: faqent re join without -u -- it's a weird default & i'm imitating un*x join
MINOR: faqent re cat xyz | mlr ... vs. mlr ... xyz:
  o easier to up-arrow/control-P & tack on a then-statement
  o lose no function except FILENAME/FNR
  o lose ~10% perf due to no mmap

MAJOR: statsn covar, ols, logistic: port material from my stats_m/sackmat_m for much of that
MAJOR: uni/multivariate logistic for ternary & above

MINOR: --mmap @ mlr -h
MINOR: ctype ff @ bld.out
MINOR: sampler UTs (w/ spec rand seed)
MINOR: bus-insurance dev page
MINOR: even absent manpage autogen from C, at least manually update mlr.1 to match current output

MAJOR: multiple -a/-f in stats1/2? (still only one -g ...)

----------------------------------------------------------------
MAJOR: regex

* mlr --regex-help; xrefs from put/filter/et al. -h's; into mld??
* optimize starts-with/ends-with

----------------------------------------------------------------
MAJOR: manpage

* xroff links:
  - http://www.linuxhowtos.org/System/creatingman.htm
  - https://www.gnu.org/software/groff/manual/html_node/Man-usage.html
* solve the duplication problem, and minimize dependencies:
  ? mlr --manpage all in C?
  ? maybe generate most content in C with a post-processor in some widely
    portable language (e.g. perl is gross but is everywhere) to add in the
    groff markups.
  ? maybe something template-driven vaguely like poki??
  - whatever it is it needs to be as automated as possible

LEVELS:
* figure out a2x/xml/...
* poki
* ... what else ...
* write those up in the docs, including required packages

----------------------------------------------------------------
MAJOR: csv mem-leak/read-perf

* current option runs faster w/o free, apparently due to heap-fragging
  o memory leak in csv reader! careful about slls data, and do not use lrec_put_no_free
  o redo inline-pasting but this time correctly weight the fragging effect
  o power-of-two
* for stdio, needs some thought ...
* ... but for mmap, it's almost always not necessary to strdup at all:
  only on escaped-double-quote case.
* denormalize the pbr & make stdio pbr & ptr-backed (mmap,UT-string) pbr.
* code-dup (yes, sadly!) the CSV reader into two & do strups in stdio
  but lrec_put w/ !LREC_FREE_VALUE for ptr-backed.
* or *maybe* pbr retent/free-flags for string/mmap w/o denorm, but only
  if it's both elegant and fast
! experimental/getlines.c shows that even without the heap-fragging
  issue, pfr+psb is 3.5x slower than getdelim. again suggesting
  multi-char-terminated getdelim might be the best line of approach.

----------------------------------------------------------------
MINOR:

* double-check for off-by-one buflen in cline/sline
* off-by-one error on fnr dkvp errmsg?
* scroll-stalls in mlrdoc!! really bad on the droid.
* hash-chain ifdef instrumentation -> maybe find a better hash function out there
* dsls/ build outside of pwd? or just lemon $(absdir)/filenamegoeshere.y?
* pprint join?
* comma-number -- using locale?
* poki cover -> readme
* include lemon-generated .c/.h or not
* lemon in-dir -- cf wiz note
* gprof link with -lc on FreeBSD -- ?

================================================================
HN FEEDBACKS 2015-08-15 (https://news.ycombinator.com/item?id=10066742)

look-ats:
* cq?
* https://github.com/harelba/q
* https://github.com/google/crush-tools
* https://github.com/BurntSushi/xsv
* https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_parsers.py
* https://drill.apache.org
* https://github.com/dbro/csvquote

xperf:
* post rust/go cmps

================================================================
NEATEN

!! xxx's in the code
* source hygiene: top-of-header comments, readme re memory management, etc.

================================================================
COOKBOOK

* doc w/ very specific examples of sed/grep/etc preprocessing to structurize semi-structured data (e.g. logs)

================================================================
MEM MGMT:

* full void-star-payload frees
* multi-level frees in stats1/stats2/step subcmds (control-plane structures)
* multi-level frees in stats1/stats2/step hashmaps (data-plane structures)
* _free funcptr/funcs for mappers
* free last rec in streamer?
* look strdups at other lhm*
* look at any other strdups
* note that this free-at-end is highly pedantic *except* it allows me to check valgrind 100% leak-free
  to be sure i don't miss the ones that *do* matter in the record-loops

================================================================
NIDX/DKVP/...:

* maybe have a mode where "a" (not "a=1") -> "a=" with dkvp and "1=a" with nidx? 3rd format? 3 flavors
  of one format??

* dkvp as generalization of nidx. restructure mlrwik to emphasize this.
  tightly integrate 'mlr label'. maybe rename 'mlr label' to 'mlr name' or
  some such.  perhaps entirely coalesce nidx&dkvp in the code & the docs;
  presumably with a different name.  something about "header with data" or
  "key with value"?? lower-cased only rather than making it an acronym?

* nidx via field widths; left/right space-strip -- *only* if headers also don't have whitespace!!!

* maybe call dkvp labeled-index fmt
* definitely put nidx before dkvp in the mlrwik/formats page
* "index-numbered" -> "implicitly index-numbered" in mlrwik

================================================================
FUNCTIONS
? index   (i_ss) -- not very useful unless there are functions which take an index as an argument ...
? bit ops (i_ii) & | ^ << >>
? log2, exp2

================================================================
UT/REG
* cat/X/cat for all X
* tac/X/cat for all X
* cat/X/tac for all X
* tac/X/tac for all X
* multi-csv I/O: include --icsvlite --odkvp and --idkvp --ocsv, as well as --csv cases
* het-xtab out
* modulus operator
* strlen
* make should-fail machinery & use it for null-key dkvp cases.
* all mathlib funcs
* int/float/string
* roundm function: round to multiple of m
* boolean() function
* boolean-valued put, e.g. mlr put '$ok = $x <= 10'
* nullability cases, esp. sort, and math funcs
* join with het data
* join with mixed-format/separator (left vs. right)
* join with left/right-prefix
* mmap/stdio UTs; run all cases with --mmap and again with --no-mmap
* all __X_MAIN__ instances -> UT code (effectively all-but-dead code at present)

================================================================
DOC

* Note that PCA is better than OLS for roundoff error (sum of squares ...):
  grep red data/multicountdown.txt | head -n 13 | mlr --opprint stats2 -a linreg-ols -f t,count
  grep red data/multicountdown.txt | head -n 14 | mlr --opprint stats2 -a linreg-ols -f t,count

================================================================
IMPROVEMENTS

* free-flag for string mlrvals

* run go/d/etc on sprax & include #'s in perf pg, and/or rm xref in the latter & just post xlang perf #'s there
* link to gh/jk/m xlang impls ... and/or cardify their sources :) ... or maybe just link to gh/jk/m xlang dir
* ack c impl has been repeatedly optimized but even the original version (also cutc.c ...) outperforms

* make a -D for hash-collision stats ...

* update t1.rb including numeric sort; fix appropriateness of -t=

* mlr sort -f -nr x: probably should sweep through all subcmds & disallow args to start
  with '-'

================================================================
HARDER HYGIENE
* eliminate compiler warnings for lemon & its autogenerated code

================================================================
PYTHON
* pgr + stats_m same I/O modules??

================================================================
FYI

Semantic versioning:
Given a version number MAJOR.MINOR.PATCH, increment the:

* MAJOR version when you make incompatible API changes,
* MINOR version when you add functionality in a backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.

Initial release:
https://news.ycombinator.com/item?id=10066742
v2.0.0:
https://news.ycombinator.com/item?id=10132831
