M.U.P.P.I.X. purveyors of fine Data Analysis Tools
  • Home
    • Applications
    • Blog
    • About
    • Clients
    • Company
    • Other Links
  • Training
  • Get Started
    • Muppix Keywords
    • Glossary find Keywords
    • Templates >
      • Capture
      • Explore
      • Clean-up
    • Approach to BigData
  • Linux Cheatsheet
    • Linux Cheatsheet 2
    • Essential Terminal Commands
    • Basic Linux Commands
  • SQL & Excel Commands
    • SQL Cookbook
    • SQL Cookbook 2
    • SQL search entire DataBase
    • SQL Import Table Tool
    • Excel OneLiners
  • Download
Apple and Linux computers already have this Terminal Icon built in, to run the Muppix commands. For a Windows PC you will need to install a free Unix-like environment  from Red Hat's www.Cygwin.com . Check out the Download section

Picture
Find the Toolkit Keyword . Use the Capture & Cleanup Templates to break down your Data Extraction task. If you dont even know where the data is , use the Explore Template. The Glossary lists the simple Toolkit terminology.

Try this Spreadsheet to get a feel for the keywords :
muppix keywords -

Picture
Find your command(s). Start a Terminal & search the Toolkit with the Keywords to find your Command. Use the Basic Navigation to help using the Terminal.


Picture

Picture
Run the Commands. Open up a second Terminal, go to the data, paste the commands & fill in your own search text in the coloured section. Run the Command & view the results. Refine using more Commands & then finally Save the results


___________________________________________________________________________________________________________________
For example :  Generate a Spreadsheet of Investment Trust fund prices direct from a website
                                                                                                                                                              Copy the whole website into Notepad & Save it

Picture
Find Keywords

  • Cut/Paste the whole text from a website ( in this example Trustnet.com ) & save the file to say d:/data/myprices.txt 
  • Use the Cleanup Template on the muppix website, and notice that the lines with prices always contain these 2 different texts :    "+"  aswell as  "chart". Infact "+" always appears before "chart" on each line with prices.
        So the Muppix keyword to use here are  mytext before mysecondtext

Picture
_______________________________________________________________________________________________________________________________________

Picture
Find the Command in the muppix.txt toolkit

  • Open Terminal
  • Goto where you've saved the Muppix Toolkit : c:/muppix
  • search for the keywords mytext before mysecondtext


                                           Linux Terminal                                                 

$ cd c:
/cygdrive/c
$ cd muppix
/cygdrive/c/muppix

$ cat muppix.txt    |   grep mytext    |    grep mysecondtext    |    grep before
awk '$0 ~ /mytext.*mysecondtext/'       ## 'mytext' is before 'mysecondtext', 'mysecondtext' after 'mytext'


Picture
Then run the Command(s) on the myprices.txt file

  • Open another terminal to search the file with prices from the web-site
  • Goto the directory of the new file d:/data
  • Check the most recently created myprices.txt  file is really there
  • Show the myprices.txt file and Cut/Paste the command (the text before the "##" )
  • Fill in the coloured section of the command with "+" and "chart"





  • Save the results as a spreadsheet (csv extention)

                                             Linux Terminal                                          

$ cd d
/cygdrive/d
$ cd data
/cygdrive/d/data

$ ls -ltr
----------+ 1 Owner None 19576 May  7 20:48 myprices.txt

$ cat myprices.txt | awk '$0 ~ /+.*chart/'
8    [+]    chart    Aberdeen New Thai IT PLC Ord 25p    Aberdeen Asset Managers    Equity    514.00    -6.8    551.65    1.56   
9    [+]    chart    Aberdeen Private Equity Ltd GBP    Aberdeen Asset Managers    Equity    83.75    -23.7    109.77    2.39   
10    [+]    chart    Aberdeen Smaller Companies High Income Ord 50p    Aberdeen Asset Managers    Equity    181.50    -5.2   
11    [+]    chart    Aberforth Geared Income Trust Plc Ord 1P    Aberforth Partners    Equity    123.50    -15.2    145.63    5.24   
12    [+]    chart    Aberforth Smaller Companies Trust plc Ord    Aberforth Partners    Equity    813.50    -11.1    914.51    2.74   
13    [+]    chart    Absolute Return Trust GBP    Fauchier Partners    Hedge    113.50    -15.5    134.36    n/a   
14    [+]    chart    Acencia Debt Strategy Ltd    Saltus Partners LLP    Hedge    95.75    -9.6    105.92    3.58   


$ cat myprices.txt | awk '$0 ~ /+.*chart/'  >myspreadsheet.csv
  • Finally double click on the myspreadsheet.csv file in Windows Explorer and import into Excel
Picture

Muppix provides innovative solutions and Training to make sense of large scale data.
Backed by years of industry experience, the Muppix Team have developed a Free Data Science Toolkit to extract and analyse multi-structured information from diverse data sources


Company

Blog

Training

Professional Services

Get Started