Home

Welcome to the data-wrangler wiki!

mockup

External docs

Here's a mockup containing the current (July 2014) thoughts.
Here is a link to some high level thoughts regarding the app: high level thoughts

Operations

Application level

open data file for wrangling
open data file for wrangling using pre-existing input schema
convert data file to alternate format using pre-existing conversion schema
merge two input files (user specifies key/foreign key)
load file from cloud-storage (columns will be auto-assigned)

File level

just view Head rows (10 rows?) - default for huge files
just view Tail rows (10 rows?)
re-sample at alternate frequency (only columns that have been annotated with type/units)
export using pre-existing output schema (essential cols for that schema must have been specified)
delete incomplete/invalid rows
interpolate incomplete/invalid cells (for annotated columns)
specify delimiter(s)
specify fixed widths
specify message type column (after doing this, there's a set of column definitions for each message type)
filter for message type
provided persistent document annotations for file type (Source, link to format reference)
register input schema for re-use in application
export input schema for exchange
push file to cloud storage
provide version tracking for all file modifications (with comments - see Version Historybelow)

Column level

specify data type & units
add calculated column based on selected one(s) [see below for calculation examples]
apply time offset (for time data type)
view statistical overview (5 number summary for quantitative, other for qualitative)
view xy-plot (if quantitative, plot against time if time column annotated)
view value listing (if qualitative. Only first few values if there are lots of them)
remove outliers (if quantitative, configurable)
mark as identity column (used for grouping data for export, e.g. TgtId)
delete column
hide (ignore) column
provided persistent column annotations

Row level

delete row
insert new row (duplicating previous one)

Cell level

replace value with interpolated one

Calculations/Operations

A series of operations will be available for column-related activities. These operations will typically produce one new column based on one existing column, but scripting (or custom UI) will allow for 1-* input columns to generate 1-* output columns

inverse
parse date/time text
parse latitude/longitude text
add time offset
units conversions (time, distance, area, velocity, acceleration)
multiply by value
add value
smooth (range of smoothing types)
apply user-provided script Eclipse EASE project offers range of scripting languages

Version History

Sample of version history: Version Snapshot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly