EH-Net
May 22, 2013, 09:34:47 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Go back to The Ethical Hacker Network Online Magazine Home Page
 
   Home   Help Calendar Login Register  
Pages: [1]   Go Down
  Print  
Author Topic: Bit a trouble with a script to parse out specific lines in text file  (Read 3491 times)
0 Members and 1 Guest are viewing this topic.
3xban
Hero Member
*****
Offline Offline

Posts: 608


View Profile WWW
« on: January 25, 2012, 11:08:28 AM »

This is a straight text file.  I am writing the script using Powershell since I am a bit more familiar working with text files in PS.  But I can do this in vbscript or python if it would be easier.

So this file contains a number of lines in it and there are many tabbed areas.  I was able to remove the tab spacing from the file to clean it up a but.  I was also successful at pulling out the lines that matched a specific string.  But then the wrench happend, there are a large number of lines that have carriage returns within the string contents.  here is my goals...

String to match is "AUTHORITY-CHECK OBJECT"

1) Remove Empty Tab Spacing - done
Code:
(gc FILENAME) -replace ' {2,}','' | sc NEWFILENAME
I am using variables where you see filenames since the paths are long and I will eventually replace with a user input box of sorts.

2) Remove carriage returns after "AUTHORITY-CHECK " if the next line begins with OBJECT or there is nothing after AUTHORITY-CHECK

3) Write to new clean file - done
4) Select lines from the clean file that match the string "AUTHORITY-CHECK OBJECT" and write to dump/final file - done
Code:
Select-String NEWFILENAME -pattern 'AUTHORITY-CHECK OBJECT' | foreach {$_.Line} | out-file -Encoding ASCII NEWFINALFILE

So yes the carriage returns are mucking up my results.  the information will be used to show our devs their fixes to the app are not working.  So I want to make sure all lines are included.  First results before I noticed the carriage returns showed 192 lines.  Going back there about 326 total lines containing "AUTHORITY-CHECK"

Any assistance would be appreciated!

Oh and my programming skills suck Wink
Logged

Certs: GCWN
(@)Dewser
lorddicranius
Sr. Member
****
Offline Offline

Posts: 447



View Profile WWW
« Reply #1 on: January 25, 2012, 12:43:04 PM »

I'm just learning to program so pardon my ignorance if I'm way off here haha, but is there an equivalent to Python's rstrip() in Powershell?  With rstrip() in Python, you can strip off all whitespaces of a string or specify certain characters to remove - which would be carriage returns ("\r" in Python) in this case.

Page for rstrip() in Python:
http://www.tutorialspoint.com/python/string_rstrip.htm
Logged

GSEC, eCPPT, Sec+
3xban
Hero Member
*****
Offline Offline

Posts: 608


View Profile WWW
« Reply #2 on: January 25, 2012, 01:11:14 PM »

I think Trim() can do it but I haven't found good steps in utilizing Trim.  TrimStart() can be used for space before the string value and there is an equivalent Trim for the end.  The initial code I use to clean the first file works well enough.  The part I got hung up on is the code to find carriage returns that appear after my string value.  I want to find and replace them with a space or find and include the next line as part of the entire string. 

Basically there are a series of fields that follow the string I am searching for.  In the case of the carriage return, that set of data is dropped to the next line.

AUTHORITY-CHECK<CRLF>
OBJECT 'FIELD1' 'FIELD2' 'FIELD3' ..... 'LASTFIELD'

So I want to kill that CRLF at the under of the first line and join that line with the 2nd line.  or make it so OBJECT is seen as part of the first line.

I could toss this in Python, but I know PS much better.  Either way I will have to install Python or PS on my coworker's system so he can run these scripts.  I was pondering vbscript for this since it will natively be supported.
Logged

Certs: GCWN
(@)Dewser
lorddicranius
Sr. Member
****
Offline Offline

Posts: 447



View Profile WWW
« Reply #3 on: January 25, 2012, 01:30:11 PM »

Ooh, I see.  That makes more sense to me now...and beyond my programming knowledge lol.  Hopefully somebody else here can chime in with more experience...
« Last Edit: January 25, 2012, 05:05:14 PM by lorddicranius » Logged

GSEC, eCPPT, Sec+
lorddicranius
Sr. Member
****
Offline Offline

Posts: 447



View Profile WWW
« Reply #4 on: January 25, 2012, 05:05:53 PM »

I'm not into working with files yet in my programming learning yet, but can you identify line numbers of the lines you find "AUTHORITY-CHECK<CRLF>" on?  For example, as your script is searching through the text file, it finds "AUTHORITY-CHECK<CRLF>" on line 5.  If so, maybe you can check that line number + 1 to see if the next line (5 + 1) begins with "OBJECT..." (if "line number matching "AUTHORITY-CHECK" + 1 startswith "OBJECT").  And if it does, then strip off the carriage return on the "AUTHORITY-CHECK" line and print both of the lines together as one. (print(line 5, 5+1)).

I hope that makes sense Undecided I'm studying my Python material right now and the stuff I'm learning has me thinking about this thread again haha
Logged

GSEC, eCPPT, Sec+
ajohnson
Recruiters
Hero Member
*
Offline Offline

Posts: 1057


aka dynamik


View Profile WWW
« Reply #5 on: January 25, 2012, 05:43:47 PM »

It's easy enough to strip out the new line if you have the entire thing in a single variable, but I don't think that's going to help you in this scenario.

Code:
PS C:\> $test = "this is `r`na test"
PS C:\> $test
this is
a test
PS C:\> $test -replace "`r`n", ""
this is a test

I think it would be easiest to store the string a temp variable until the next line is read. If the next line is AUTHORITY-CHECK, write out (or do whatever with what's in the temp variable because the line is complete). Alternatively, if the next line is OBJECT, concatenate that with the temp variable and then write that out (or, again, do whatever further processing is required). Similar to what lorddicranius said.

Edit: I'm not that familiar with PowerShell. Can you read the entire file into a single variable instead of line by line? If so, you could do that first and then replace "`r`nObject" with "Object".
« Last Edit: January 25, 2012, 05:46:18 PM by dynamik » Logged

WIP: GCFA | www.infosiege.net | @infosiege

The day you stop learning is the day you start becoming obsolete.
3xban
Hero Member
*****
Offline Offline

Posts: 608


View Profile WWW
« Reply #6 on: January 26, 2012, 07:41:08 AM »

Thanks guys the logic seems right, now it just needs to be put to code.  Dynamic, to my understanding, once a Get-Content cmdlet is used on a file, you can pipe the commands together so essentially you would:

gc filename | foreach line in file matching AUTHORITY-CHECK[CRLF] remove CRLF and combine with next line. 

Not how it would code out but that is the logic, and then you can add:

| set-content NEWFILE

Alternatively I can probably toss in some IF THEN statements to capture the broken lines. 

IF line = AUTHORITY-CHECK`r
THEN -replace "'r", " "
AND combine with next line

or something like that.  Might work on it later today since that always makes the day go quicker.  Got frustrated when I hit the carriage return problem and realized I was spending too much time on this issue and I had other higher priority items to work on (though not quite as exciting).
Logged

Certs: GCWN
(@)Dewser
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.18 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!
Page created in 0.117 seconds with 20 queries.