.

Bit a trouble with a script to parse out specific lines in text file

<<

Triban

User avatar

Hero Member
Hero Member

Posts: 620

Joined: Fri Feb 19, 2010 4:17 pm

Post Wed Jan 25, 2012 12:08 pm

Bit a trouble with a script to parse out specific lines in text file

This is a straight text file.  I am writing the script using Powershell since I am a bit more familiar working with text files in PS.  But I can do this in vbscript or python if it would be easier.

So this file contains a number of lines in it and there are many tabbed areas.  I was able to remove the tab spacing from the file to clean it up a but.  I was also successful at pulling out the lines that matched a specific string.  But then the wrench happend, there are a large number of lines that have carriage returns within the string contents.  here is my goals...

String to match is "AUTHORITY-CHECK OBJECT"

1) Remove Empty Tab Spacing - done
  Code:
(gc FILENAME) -replace ' {2,}','' | sc NEWFILENAME

I am using variables where you see filenames since the paths are long and I will eventually replace with a user input box of sorts.

2) Remove carriage returns after "AUTHORITY-CHECK " if the next line begins with OBJECT or there is nothing after AUTHORITY-CHECK

3) Write to new clean file - done
4) Select lines from the clean file that match the string "AUTHORITY-CHECK OBJECT" and write to dump/final file - done
  Code:
Select-String NEWFILENAME -pattern 'AUTHORITY-CHECK OBJECT' | foreach {$_.Line} | out-file -Encoding ASCII NEWFINALFILE


So yes the carriage returns are mucking up my results.  the information will be used to show our devs their fixes to the app are not working.  So I want to make sure all lines are included.  First results before I noticed the carriage returns showed 192 lines.  Going back there about 326 total lines containing "AUTHORITY-CHECK"

Any assistance would be appreciated!

Oh and my programming skills suck ;)
Certs: GCWN
(@)Dewser
<<

lorddicranius

User avatar

Sr. Member
Sr. Member

Posts: 448

Joined: Thu Mar 03, 2011 3:54 am

Post Wed Jan 25, 2012 1:43 pm

Re: Bit a trouble with a script to parse out specific lines in text file

I'm just learning to program so pardon my ignorance if I'm way off here haha, but is there an equivalent to Python's rstrip() in Powershell?  With rstrip() in Python, you can strip off all whitespaces of a string or specify certain characters to remove - which would be carriage returns ("\r" in Python) in this case.

Page for rstrip() in Python:
http://www.tutorialspoint.com/python/string_rstrip.htm
GSEC, eCPPT, Sec+
<<

Triban

User avatar

Hero Member
Hero Member

Posts: 620

Joined: Fri Feb 19, 2010 4:17 pm

Post Wed Jan 25, 2012 2:11 pm

Re: Bit a trouble with a script to parse out specific lines in text file

I think Trim() can do it but I haven't found good steps in utilizing Trim.  TrimStart() can be used for space before the string value and there is an equivalent Trim for the end.  The initial code I use to clean the first file works well enough.  The part I got hung up on is the code to find carriage returns that appear after my string value.  I want to find and replace them with a space or find and include the next line as part of the entire string. 

Basically there are a series of fields that follow the string I am searching for.  In the case of the carriage return, that set of data is dropped to the next line.

AUTHORITY-CHECK<CRLF>
OBJECT 'FIELD1' 'FIELD2' 'FIELD3' ..... 'LASTFIELD'

So I want to kill that CRLF at the under of the first line and join that line with the 2nd line.  or make it so OBJECT is seen as part of the first line.

I could toss this in Python, but I know PS much better.  Either way I will have to install Python or PS on my coworker's system so he can run these scripts.  I was pondering vbscript for this since it will natively be supported.
Certs: GCWN
(@)Dewser
<<

lorddicranius

User avatar

Sr. Member
Sr. Member

Posts: 448

Joined: Thu Mar 03, 2011 3:54 am

Post Wed Jan 25, 2012 2:30 pm

Re: Bit a trouble with a script to parse out specific lines in text file

Ooh, I see.  That makes more sense to me now...and beyond my programming knowledge lol.  Hopefully somebody else here can chime in with more experience...
Last edited by lorddicranius on Wed Jan 25, 2012 6:05 pm, edited 1 time in total.
GSEC, eCPPT, Sec+
<<

lorddicranius

User avatar

Sr. Member
Sr. Member

Posts: 448

Joined: Thu Mar 03, 2011 3:54 am

Post Wed Jan 25, 2012 6:05 pm

Re: Bit a trouble with a script to parse out specific lines in text file

I'm not into working with files yet in my programming learning yet, but can you identify line numbers of the lines you find "AUTHORITY-CHECK<CRLF>" on?  For example, as your script is searching through the text file, it finds "AUTHORITY-CHECK<CRLF>" on line 5.  If so, maybe you can check that line number + 1 to see if the next line (5 + 1) begins with "OBJECT..." (if "line number matching "AUTHORITY-CHECK" + 1 startswith "OBJECT").  And if it does, then strip off the carriage return on the "AUTHORITY-CHECK" line and print both of the lines together as one. (print(line 5, 5+1)).

I hope that makes sense :-\ I'm studying my Python material right now and the stuff I'm learning has me thinking about this thread again haha
GSEC, eCPPT, Sec+
<<

dynamik

Recruiters
Recruiters

Posts: 1119

Joined: Sun Nov 09, 2008 11:00 am

Location: Mile High City

Post Wed Jan 25, 2012 6:43 pm

Re: Bit a trouble with a script to parse out specific lines in text file

It's easy enough to strip out the new line if you have the entire thing in a single variable, but I don't think that's going to help you in this scenario.

  Code:
PS C:\> $test = "this is `r`na test"
PS C:\> $test
this is
a test
PS C:\> $test -replace "`r`n", ""
this is a test


I think it would be easiest to store the string a temp variable until the next line is read. If the next line is AUTHORITY-CHECK, write out (or do whatever with what's in the temp variable because the line is complete). Alternatively, if the next line is OBJECT, concatenate that with the temp variable and then write that out (or, again, do whatever further processing is required). Similar to what lorddicranius said.

Edit: I'm not that familiar with PowerShell. Can you read the entire file into a single variable instead of line by line? If so, you could do that first and then replace "`r`nObject" with "Object".
Last edited by dynamik on Wed Jan 25, 2012 6:46 pm, edited 1 time in total.
The day you stop learning is the day you start becoming obsolete.
<<

Triban

User avatar

Hero Member
Hero Member

Posts: 620

Joined: Fri Feb 19, 2010 4:17 pm

Post Thu Jan 26, 2012 8:41 am

Re: Bit a trouble with a script to parse out specific lines in text file

Thanks guys the logic seems right, now it just needs to be put to code.  Dynamic, to my understanding, once a Get-Content cmdlet is used on a file, you can pipe the commands together so essentially you would:

gc filename | foreach line in file matching AUTHORITY-CHECK[CRLF] remove CRLF and combine with next line. 

Not how it would code out but that is the logic, and then you can add:

| set-content NEWFILE

Alternatively I can probably toss in some IF THEN statements to capture the broken lines. 

IF line = AUTHORITY-CHECK`r
THEN -replace "'r", " "
AND combine with next line

or something like that.  Might work on it later today since that always makes the day go quicker.  Got frustrated when I hit the carriage return problem and realized I was spending too much time on this issue and I had other higher priority items to work on (though not quite as exciting).
Certs: GCWN
(@)Dewser

Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron
.
Powered by phpBB® Forum Software © phpBB Group.
Designed by ST Software