M.U.P.P.I.X. purveyors of fine Data Analysis Tools
  • Home
    • Applications
    • Blog
    • About
    • Clients
    • Company
    • Other Links
  • Training
  • Get Started
    • Muppix Keywords
    • Glossary find Keywords
    • Templates >
      • Capture
      • Explore
      • Clean-up
    • Approach to BigData
  • Linux Cheatsheet
    • Linux Cheatsheet 2
    • Essential Terminal Commands
    • Basic Linux Commands
  • SQL & Excel Commands
    • SQL Cookbook
    • SQL Cookbook 2
    • SQL search entire DataBase
    • SQL Import Table Tool
    • Excel OneLiners
  • Download

Clean-up:

Purpose is to delete unwanted text, change the structure into its final shape. Occasionally it requires the use of a Pattern listed below. 

Template: Delete unwanted text
Delete lines my mytext at beginning, end
Delete everything after/before mytext
Delete line with mytext in a certain columns 
Delete certain columns
Delete Duplicate lines
Delete Blank Lines

Template: Clean-up data to make it consistent or to line it up to change its structure
Replace mytext with mysecondtext
Select mytext, then on rest of line replace mysecondtext with mythirdtext
Insert mytext
                    
Template:  Whip into shape - Change structure
Purpose: you now have the essential information on lines, but it needs to be cleaned up or its in the wrong shape and needs to be turned into a long list, or a table for a spreadsheet, or stripped off duplicates,  etc.

Split lines based on mytext  (ie keep the text after or before mytext)


Left Align, Right Align
Select certain columns, delete others

Sort, sort by a certain column(s)


Special structures:
Convert Table into Spreadsheet (CSV)
Address Pattern
Product List Pattern
Occurence - Pivot Table
Thesaurus - research words that often occur near mytext & continue searching & build up a list of keywords based on your original mytext
Log file tips
WebSite text into a table 
Paragraph , select 1st line of paragraph, delete last line of paragraph

Picture
Oera Linden Bok

Muppix provides innovative solutions and Training to make sense of large scale data.
Backed by years of industry experience, the Muppix Team have developed a Free Data Science Toolkit to extract and analyse multi-structured information from diverse data sources


Company

Blog

Training

Professional Services

Get Started