================================================================
BUGFIXES

* leak:
  vgrun mlr put -q '@s=$x; end{unset @s}' ./reg_test/input/unset1.dkvp

* dig into neatener ...

* json input:

  vgrun mlr --ijson --opprint cat reg_test/input/small-non-nested-wrapped.json

================================================================
TOP-OF-LIST SUMMARY

! feature: record-repeater w/ key field(s). UT & mld.
! feature: stats1 weighted by count (or other) field?

? make a 3.5.0 with nest, shuffle, put/filter -f, cat -n bugfix, etc.? after valgrinds ...
* rh/fedora/centos mlr-3.4.0

* reduce parameter-marshaling by pevalbag; measure perf delta (suspect negligible)
* neaten/xxxes/valgrind/UT/doc for emit/unset/tri-split-null/+=etal
* double-check oosvar+emit+tri-split-null by re-implementing as many mappers as possible using ossvars + emit
* readme re srec vs lrec, & oosvar vs mlhmmv
! major doc section re oosvars, $*, and emits

* allow begin/end srec assignments in the grammar, to allow better error messages than 'syntax error' in the caller
! valgrinds

* 64-bit lengths for containers. test with 5-billion-integer-seq data.

! screencaps for README.md & cover !
* paragraph filter: mlr --nidx --rs '\n\n' --fs '\001'  filter -x '$1 =~ "slls_from_line" && $1 =~ "mapper_reshape_parse_cli"'

? call for use-cases? (go first.)
* olh xrefs between reshape & nest; between sample, bootstrap, & shuffle.
* doc section on programmability spectrum (perl to asic) & how much effort is/isn't worth putting into miller. & maybe shouldabeen pickalanguage w/ I/O support and API.
* doc rename-all w/ regex-captures: e.g. de-spacifying field names etc.  echo 'a x z=1,b x z=1' | mlr rename -r -g ' ,:' -> a:x:z=1,b:x:z=1
* in why-miller: list the intersection points. perf; multifmt; math+strings; expressive/programmable; compact notation; pipe-friendly.
* lrec get followed by put/remove: getext variant returning node for unlink, valpoke, or null == append to avoid double-searching.
* dsls deps still not quite right?
* faqent re nidx output: '$9 = ...' doesn't make it the 9th output field.

================================================================
* relcut material since 3.4.0:
  BIG:
  - oosvars
  - begin/end
  - emit/all
  - unset/all
  - bare-boolean and pattern-action
  - put -q
  SMALL:
  - nest
  - shuffle
  - repeat
  - put -f / filter -f
  - \t \" et al. in DSL string literals
  - typeof
  - --nr-progress-mod
  - cat -n bugfix

================================================================
~@ PRE4 @~

----------------------------------------------------------------
IMPLEMENT:

! declare @ doc what the 'f' stands for in 'emitf': first-level? :/

? people may ask for ingest version of dump ...

> do more oosvar experiments before finalizing mlr4

----------------------------------------------------------------
HYG:

k xxxes
* valgrinds

----------------------------------------------------------------
UT:

* define/test behavior for null-valued filter / cond-block

  mlr put 'filter @nonesuch' ../data/small
  mlr put 'filter @nonesuch==true' ../data/small
  mlr put 'filter $nonesuch==true' ../data/small
  mlr put 'filter $nonesuch' ../data/small

  mlr put -q '@v=1; @nonesuch { emit @v}' ../data/small
  mlr put -q '@v=1; @nonesuch==true { emit @v}' ../data/small
  mlr put -q '@v=1; $nonesuch { emit @v}' ../data/small
  mlr put -q '@v=1; $nonesuch==true { emit @v}' ../data/small

* evolve & extend the existing UTs

----------------------------------------------------------------
DOC:

* oosvar-to-oosvar assignments (w&w/o subselectors) with treecopy
* $*
* emit, multi-emit, emit-all @ test-mlhmmv.  note rhs="" vs. remove key. unset is latter since rhs="" does former.
  cf cut -x.
* unset, multi-unset, unset-all @ test-mlhmmv
* += et al.
* horizontality of emits
* side-by-side pasting:
  mlr --opprint put -q '@v[NR]["sum"] += $x; @v[NR]["count"] += 1; end{emit all,"NR"}' reg_test/input/abixy-het 

w subselector emits ... after syntax rework
w basenaming syntax for subselector emits (once it exists!)

* valgrind cleaner (iterate on):
  mlr --nidx --rs '\n\n' --fs '\001' put -q '$1 =~ "^mlr" {@a=$1}; $1 =~ "by 0x"{@b=$1; @c=@a."\n".@b; emit @c}; ' vg.out

================================================================
OOSVAR EXPERIMENTS:

* check for functionality -- this is my best chance to do a full-featuredness check for oosvars & emit
* also do timing comparisons

* list:
  - bootstrap?
  - count-distinct?
  - cut?
  - decimate? easy now with NR%10 if not -g; what about with -g?
  - group-by?
  - group-like?
  - having-fields?
  - head?
  - histogram?
  - (rename? reorder?)
  - sample, w/ & w/o -g?
  - stats1 count,sum,mean,stddev,var,skew,kurt; mode; min,max?
  - stats2 linregs,r2,corr,cov?
  - step delta,from-first,ratio,rsum,counter,ewma?
  - tail?
  > top?
  - uniq?

* perf notes:

  $ time mlr stats1 -a mean -f x ../data/big.dkvp
  x_mean=0.499567
  real    0m1.486s 0m1.500s 0m1.637s

  $ time mlr put -q 'begin @x_count=0;begin @x_sum=0.0;@x_sum = @x_sum+$x;@x_count=@x_count+1; end @x_mean=@x_sum/@x_count; end emit @x_mean'  ../data/big.dkvp
  x_mean=0.499567
  real    0m2.262s 0m2.371s 0m2.356s

  mlr put 'begin @a=0.1;begin @b=0.9;$e=NR==1?$x:@a*$x+@b*@e;@e=$e' then step -a ewma -d 0.1 -f x ../data/big.dkvp
  time mlr put 'begin @a=0.1;begin @b=0.9;$e=NR==1?$x:@a*$x+@b*@e;@e=$e' ../data/big.dkvp > /dev/null
  real  0m3.976s 0m3.971s 0m3.944s
  time mlr step -a ewma -d 0.1 -f x ../data/big.dkvp > /dev/null
  real  0m3.250s 0m3.283s 0m3.300s

================================================================
NON-PRE4
================================================================

* look into 'perf'
* linreg in terms of stats only ... check py book
* option for: print to stderr NR if NR%N==0 for user-specified N
* cat/cut langcomps (w/ gh links) -> perf page

* pipe-viewer-like feature to stderr?

* interpolated percentiles

? DSL comments: replace with spaces from # to EOL/EOS, whichever comes first
* stats1/stats2 sliding-window feature? and/or with ewma-coefficients (much easier)
  - mean/stddev/var; skew/kurt?
  - linregs; corr/cov?
  ? also, option of weighted stats w/ explicit weights field?
  ? maybe just EWMA with well-known sumw followed by then-chaining. write up the weights if so?

* mld re cross-record stuff is limited to stats1/2 and step
  o this will change with begin/end and oosvars
* tbin/ok -> cookbook
* debian screenshot
* lrec_evaluators cleanup re strict, redundant statements, etc.
* ruby @ optextdep @ mld; poki+mkman
* comma-number -- using locale?
* stdin filename keyword for read-from-file-then-tail-f mode (e.g. mlr etc)
  - needs refactor for lrec_reader_alloc callsite
* perf page: (1) redo; (2) note GNU/etc; (3) compare to mawk (http://invisible-island.net/mawk/)
* EOS comments thruout
* valgrind note @ new dev page/section
* join: final sllv_free in destructor (lo-pri)
* anim ref https://github.com/edi9999/path-extractor

? json arrays -> nested w/ some delimiter? only if all array elements are terminals.

* flight misc: .screenrc -> dotfiles; more dotfile currency
* cump -> one-offs

----------------------------------------------------------------
NARRATIVE INTRO:
* sql example
* logging example
* csv example
* "what do these have in common?"

----------------------------------------------------------------
COOKBOOK/FAQ/ETC.:

* cookbook:
  - eval stuff from https://github.com/johnkerl/miller/issues/88

    $ mlr --csvlite stats2 -a linreg-pca  -f x,y x
    x_y_pca_m,x_y_pca_b,x_y_pca_n,x_y_pca_quality
    1.030300,0.949250,4,0.999859
    $ mlr --csvlite --odkvp --ofs semicolon stats2 -a linreg-pca  -f x,y x
    x_y_pca_m=1.030300;x_y_pca_b=0.949250;x_y_pca_n=4;x_y_pca_quality=0.999859
    $ eval $(mlr --csvlite --odkvp --ofs semicolon stats2 -a linreg-pca  -f x,y x)
    $ echo $x_y_pca_m
    1.030300

  - hold-and-fit regressor doc: 'then put' for residuals; note avoids two-pass &
    the saving of fit parameters
  - histo w/ min/max is effectively 2-pass (unless you have prior knowledge about the data).
    note count-distinct w/ int() func.
  - two-pass lin/logi reg vs. hold-and-fit.

  - very specific R/mysql/etc inouts

  - polyglottal dkvp/etc production.
  - very specific examples of sed/grep/etc preprocessing to structurize semi-structured data (e.g. logs)

  - checku.dash -> cookbook

* faqents:
  - rsum as proxy for per-record/agg-only mixed output

* other doc besides cookbook & faq:
  - R doc:
    ! xref @ covers x 2
    ! be very clear streaming vs. dataframe -- each has things the other can't do
    ! emph mlr has light stats but for heavyweight analysis use R et al.

* --mmap @ mlr -h
* bus-insurance dev page

================================================================
* ect feature?
  -> maybe better in cookbook ...
  - in1 optional: t (epoch seconds); default systime()
  - in2: nleft
  - in3 optional: target #/field name
  - in optional: -s flag or not
  - out1: etchours
  - out2: etcstamp

  o expose mapper_stats2_alloc
  o expose mapper_cut_alloc
  o encapsulate the following:
    mlr put '$t=systime()' \
      then filter 'NR>4' \
      then  put '$nleft=$target-$n' \
      then stats2 -s -a linreg-pca -f t,nleft \
      then put '$etc= -$t_n_pca_b/$t_n_pca_m; $etcstamp=sec2gmt($etc); $etchours=($etc-systime())/3600.0'

* mlr step -a from-first -f t \
    then cut -o -f t_from_first,ntodo \
    then step -a ewma -d 0.005,0.01,0.1 -o a,b,c -f ntodo \
    then stats2 -s -a linreg-pca -f \
      t_from_first,ntodo,t_from_first,ntodo_ewma_a,t_from_first,ntodo_ewma_b,t_from_first,ntodo_ewma_c \
    then put '
      $ect0 = -$t_from_first_ntodo_pca_b/$t_from_first_ntodo_pca_m;
      $ecta = -$t_from_first_ntodo_ewma_a_pca_b/$t_from_first_ntodo_ewma_a_pca_m;
      $ectb = -$t_from_first_ntodo_ewma_b_pca_b/$t_from_first_ntodo_ewma_b_pca_m;
      $ectc = -$t_from_first_ntodo_ewma_c_pca_b/$t_from_first_ntodo_ewma_c_pca_m
    ' \
    then cut -o -f t_from_first,ect0,ecta,ectb,ectc

----------------------------------------------------------------
* introduce a fourth, padding separator for all formats? (for leading/trailing strip/skip.)
  o allows 'x = 10' in DKVP
  o allows right-justified keys in XTAB

* hold-and-emit fraction?

* statsn covar, ols, logistic: port material from my stats_m/sackmat_m for much of that

* uni/multivariate logistic for ternary & above?

? wiki quickselect ?

* sllv_free option with callback for void-star-payload free; likewise other void-star-payload containers
* double-check for off-by-one buflen in cline/sline
* hash-collision ifdef instrumentation -> maybe find a better hash function out there
* pprint join?
* lemon in-dir -- cf wiz note
* gprof link with -lc on FreeBSD -- ?

================================================================
UT/REG
* ut cat/X/cat for all X
* ut tac/X/cat for all X
* ut cat/X/tac for all X
* ut tac/X/tac for all X
* ut multi-csv I/O: include --icsvlite --odkvp and --idkvp --ocsv, as well as --csv cases
* ut het-xtab out
* ut modulus operator
* ut make should-fail machinery & use it for null-key dkvp cases.
* ut all mathlib funcs
* ut int/float/string
* ut roundm
* ut join with left/right-prefix

================================================================
DOC

* Note that PCA is better than OLS for roundoff error (sum of squares ...):
  grep red data/multicountdown.txt | head -n 13 | mlr --opprint stats2 -a linreg-ols -f t,count
  grep red data/multicountdown.txt | head -n 14 | mlr --opprint stats2 -a linreg-ols -f t,count

================================================================
IMPROVEMENTS

* run go/d/etc on sprax & include #'s in perf pg, and/or rm xref in the latter & just post xlang perf #'s there
* link to gh/jk/m xlang impls ... and/or cardify their sources :) ... or maybe just link to gh/jk/m xlang dir
* ack c impl has been repeatedly optimized but even the original version (also cutc.c ...) outperforms

* update t1.rb including numeric sort; fix appropriateness of -t=

* more use of restrict pointers ... ?

================================================================
PYTHON
* pgr + stats_m same I/O modules??

================================================================
FYI

Semantic versioning:
Given a version number MAJOR.MINOR.PATCH, increment the:

* MAJOR version when you make incompatible API changes,
* MINOR version when you add functionality in a backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.

Initial release: https://news.ycombinator.com/item?id=10066742
v2.0.0  release: https://news.ycombinator.com/item?id=10132831

HN FEEDBACKS 2015-08-15 (https://news.ycombinator.com/item?id=10066742).
look-ats:
* cq?
* https://github.com/harelba/q
* https://github.com/google/crush-tools
* https://github.com/BurntSushi/xsv
* https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_parsers.py
* https://drill.apache.org
* https://github.com/dbro/csvquote

https://help.github.com/articles/github-flavored-markdown/

shell: mlr put '$z=$x+$y'
lldb:  run put "\$z=\$x+\$y"

http://include-what-you-use.org/

----------------------------------------------------------------
https://fedoraproject.org/wiki/How_to_create_an_RPM_package
https://wiki.centos.org/HowTos/Packages/ContributeYourRPMs
https://lists.centos.org/pipermail/centos/2012-September/129227.html
https://fedoraproject.org/wiki/Join_the_package_collection_maintainers
https://fedoraproject.org/wiki/How_to_get_sponsored_into_the_packager_group
https://fedoraproject.org/wiki/Package_Review_Process
https://docs.fedoraproject.org/ro/Fedora_Draft_Documentation/0.1/html/RPM_Guide/ch11s03.html
http://wiki.networksecuritytoolkit.org/nstwiki/index.php/HowTo_Create_A_Patch_File_For_A_RPM

================================================================
git remote add upstream https://github.com/Homebrew/homebrew
git fetch upstream
git rebase upstream/master
shasum -a 256 ../mlr-3.2.2.tar.gz
git diff HEAD^  HEAD
git diff HEAD^2 HEAD

----------------------------------------------------------------
git remote add upstream https://github.com/Homebrew/homebrew
git fetch upstream
git rebase upstream/master
git checkout -b miller-3.4.0
shasum -a 256 ../mlr-3.4.0.tar.gz
git add ...
git commit -m 'miller 3.4.0'
git push -u origin miller-3.4.0
submit the pull request

----------------------------------------------------------------
Squash commits by:
  brew update
  git checkout $YOUR_BRANCH
  git rebase --interactive origin/master
  mark each commit other than the first as "squash" or "fixup"
  git push -f

http://codeinthehole.com/writing/pull-requests-and-other-good-practices-for-teams-using-github/
