================================================================
BUGFIXES

:D

================================================================
TOP OF LIST

* faqents/cookbook

* csv read perf
* mem-mgmt scrutinize
* misc neaten issues

* packaging/currency:
  k brew
  - netbsd
  w debian
  ? redhat ?
  ? other ?

----------------------------------------------------------------

* cookbook:
  - eval stuff from https://github.com/johnkerl/miller/issues/88

    $ mlr --csvlite stats2 -a linreg-pca  -f x,y x
    x_y_pca_m,x_y_pca_b,x_y_pca_n,x_y_pca_quality
    1.030300,0.949250,4,0.999859
    $ mlr --csvlite --odkvp --ofs semicolon stats2 -a linreg-pca  -f x,y x
    x_y_pca_m=1.030300;x_y_pca_b=0.949250;x_y_pca_n=4;x_y_pca_quality=0.999859
    $ eval $(mlr --csvlite --odkvp --ofs semicolon stats2 -a linreg-pca  -f x,y x)
    $ echo $x_y_pca_m
    1.030300

  - hold-and-fit regressor doc: 'then put' for residuals; note avoids two-pass &
    the saving of fit parameters
  - histo w/ min/max is effectively 2-pass (unless you have prior knowledge about the data).
    note count-distinct w/ int() func.
  - two-pass lin/logi reg vs. hold-and-fit.

  - R/mysql/etc inouts

  - polyglottal dkvp/etc production.
  - very specific examples of sed/grep/etc preprocessing to structurize semi-structured data (e.g. logs)

* faqents:
  - rsum as proxy for per-record/agg-only mixed output

* other doc besides cookbook & faq:
  - R doc:
    ! xref @ covers x 2
    ! be very clear streaming vs. dataframe -- each has things the other can't do
    ! emph mlr has light stats but for heavyweight analysis use R et al.

* comma-number -- using locale?
* poki cover -> readme

* --mmap @ mlr -h
* bus-insurance dev page

----------------------------------------------------------------
* lrec_eval nullable etc. cleanup

* stdin filename keyword for read-from-file-then-tail-f mode (e.g. mlr etc)

* bootstrap sampling in hold-and-emit mode?? needs an lrec_copy

----------------------------------------------------------------
MAJOR: csv mem-leak/read-perf

* current option runs faster w/o free, apparently due to heap-fragging
  o memory leak in csv reader! careful about slls data, and do not use lrec_put_no_free
  o redo inline-pasting but this time correctly weight the fragging effect
  o power-of-two
* for stdio, needs some thought ...
* ... but for mmap, it's almost always not necessary to strdup at all:
  only on escaped-double-quote case.
* denormalize the pbr & make stdio pbr & ptr-backed (mmap,UT-string) pbr.
* code-dup (yes, sadly!) the CSV reader into two & do strups in stdio
  but lrec_put w/ !LREC_FREE_VALUE for ptr-backed.
* or *maybe* pbr retent/free-flags for string/mmap w/o denorm, but only
  if it's both elegant and fast
! experimental/getlines.c shows that even without the heap-fragging
  issue, pfr+psb is 3.5x slower than getdelim. again suggesting
  multi-char-terminated getdelim might be the best line of approach.

================================================================
OTHER
--------------------------------------------------------------
* ect feature?
  -> maybe better in cookbook ...
  - in1 optional: t (epoch seconds); default systime()
  - in2: nleft
  - in3 optional: target #/field name
  - in optional: -s flag or not
  - out1: etchours
  - out2: etcstamp

  o expose mapper_stats2_alloc
  o expose mapper_cut_alloc
  o encapsulate the following:
    mlr put '$t=systime()' \
      then filter 'NR>4' \
      then  put '$nleft=$target-$n' \
      then stats2 -s -a linreg-pca -f t,nleft \
      then put '$etc= -$t_n_pca_b/$t_n_pca_m; $etcstamp=sec2gmt($etc); $etchours=($etc-systime())/3600.0'

----------------------------------------------------------------
* introduce a fourth, padding separator for all formats? (for leading/trailing strip/skip.)
  o allows 'x = 10' in DKVP
  o allows right-justified keys in XTAB

MINOR: hold-and-emit fraction?

MAJOR: statsn covar, ols, logistic: port material from my stats_m/sackmat_m for much of that

MAJOR: uni/multivariate logistic for ternary & above?

? wiki quickselect ?

----------------------------------------------------------------
MINOR:

* double-check for off-by-one buflen in cline/sline
* hash-chain ifdef instrumentation -> maybe find a better hash function out there
* dsls/ build outside of pwd? or just lemon $(absdir)/filenamegoeshere.y?
* pprint join?
* include lemon-generated .c/.h or not
* lemon in-dir -- cf wiz note
* gprof link with -lc on FreeBSD -- ?

================================================================
HN FEEDBACKS 2015-08-15 (https://news.ycombinator.com/item?id=10066742)

look-ats:
* cq?
* https://github.com/harelba/q
* https://github.com/google/crush-tools
* https://github.com/BurntSushi/xsv
* https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_parsers.py
* https://drill.apache.org
* https://github.com/dbro/csvquote

xperf:
* post rust/go cmps

================================================================
NEATEN

!! xxx's in the code
* source hygiene: top-of-header comments, readme re memory management, etc.

================================================================
MEM MGMT:

* full void-star-payload frees
* multi-level frees in stats1/stats2/step subcmds (control-plane structures)
* multi-level frees in stats1/stats2/step hashmaps (data-plane structures)
* _free funcptr/funcs for mappers
* free last rec in streamer?
* look strdups at other lhm*
* look at any other strdups
* note that this free-at-end is highly pedantic *except* it allows me to check valgrind 100% leak-free
  to be sure i don't miss the ones that *do* matter in the record-loops

================================================================
UT/REG
* cat/X/cat for all X
* tac/X/cat for all X
* cat/X/tac for all X
* tac/X/tac for all X
* multi-csv I/O: include --icsvlite --odkvp and --idkvp --ocsv, as well as --csv cases
* het-xtab out
* modulus operator
* make should-fail machinery & use it for null-key dkvp cases.
* all mathlib funcs
* int/float/string
* roundm function: round to multiple of m
* join with mixed-format/separator (left vs. right)
* join with left/right-prefix
* mmap/stdio UTs; run all cases with --mmap and again with --no-mmap
* all __X_MAIN__ instances -> UT code (effectively all-but-dead code at present)

================================================================
DOC

* Note that PCA is better than OLS for roundoff error (sum of squares ...):
  grep red data/multicountdown.txt | head -n 13 | mlr --opprint stats2 -a linreg-ols -f t,count
  grep red data/multicountdown.txt | head -n 14 | mlr --opprint stats2 -a linreg-ols -f t,count

================================================================
IMPROVEMENTS

* free-flag for string mlrvals

* run go/d/etc on sprax & include #'s in perf pg, and/or rm xref in the latter & just post xlang perf #'s there
* link to gh/jk/m xlang impls ... and/or cardify their sources :) ... or maybe just link to gh/jk/m xlang dir
* ack c impl has been repeatedly optimized but even the original version (also cutc.c ...) outperforms

* make a -D for hash-collision stats ...

* update t1.rb including numeric sort; fix appropriateness of -t=

* mlr sort -f -nr x: probably should sweep through all subcmds & disallow args to start
  with '-'

================================================================
HARDER HYGIENE
* eliminate compiler warnings for lemon & its autogenerated code

================================================================
PYTHON
* pgr + stats_m same I/O modules??

================================================================
FYI

Semantic versioning:
Given a version number MAJOR.MINOR.PATCH, increment the:

* MAJOR version when you make incompatible API changes,
* MINOR version when you add functionality in a backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.

Initial release: https://news.ycombinator.com/item?id=10066742
v2.0.0  release: https://news.ycombinator.com/item?id=10132831

----------------------------------------------------------------
git remote add upstream https://github.com/Homebrew/homebrew
git fetch upstream
git diff HEAD^  HEAD
git diff HEAD^2 HEAD
