-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Ian Mayo edited this page Jul 23, 2014
·
25 revisions
Welcome to the data-wrangler wiki!
- Here's a mockup containing the current (July 2014) thoughts.
- Here is a link to some high level thoughts regarding the app: high level thoughts
- open data file for wrangling
- open data file for wrangling using pre-existing input schema
- convert data file to alternate format using pre-existing conversion schema
- merge two input files (user specifies key/foreign key)
- load file from cloud-storage (columns will be auto-assigned)
- just view Head rows (10 rows?) - default for huge files
- just view Tail rows (10 rows?)
- re-sample at alternate frequency (only columns that have been annotated with type/units)
- export using pre-existing output schema (essential cols for that schema must have been specified)
- delete incomplete/invalid rows
- interpolate incomplete/invalid cells (for annotated columns)
- specify delimiter(s)
- specify fixed widths
- specify message type column (after doing this, there's a set of column definitions for each message type)
- filter for message type
- provided persistent document annotations for file type (Source, link to format reference)
- register input schema for re-use in application
- export input schema for exchange
- push file to cloud storage
- provide version tracking for all file modifications (with comments - see Version Historybelow)
- specify data type & units
- add calculated column based on selected one(s) [see below for calculation examples]
- apply time offset (for time data type)
- view statistical overview (5 number summary for quantitative, other for qualitative)
- view xy-plot (if quantitative, plot against time if time column annotated)
- view value listing (if qualitative. Only first few values if there are lots of them)
- remove outliers (if quantitative, configurable)
- mark as identity column (used for grouping data for export, e.g. TgtId)
- delete column
- hide (ignore) column
- provided persistent column annotations
- delete row
- insert new row (duplicating previous one)
- replace value with interpolated one
A series of operations will be available for column-related activities. These operations will typically produce one new column based on one existing column, but scripting (or custom UI) will allow for 1-* input columns to generate 1-* output columns
- inverse
- parse date/time text
- parse latitude/longitude text
- add time offset
- units conversions (time, distance, area, velocity, acceleration)
- multiply by value
- add value
- smooth (range of smoothing types)
- apply user-provided script Eclipse EASE project offers range of scripting languages
Sample of version history: