How Python helped me to automate my Cyber Stuff
I am writing this post after completion of 3 weeks of learning Python and guys I am seriously impressed with this language .
You know best part is that this language help you to talk with almost any API and have lots of modules in it . Like
- Webbrowser Comes with Python and opens a browser to a specific page.
- Requests. Downloads files and web pages from the Internet.
- Beautiful Soup. Parses HTML, the format that web pages are written in.
- Selenium. Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this browser.
Web Browser Module
The webbrowser
module’s open()
function can launch a new browser to a specified URL. Enter the following into the interactive shell:
>>> import webbrowser >>> webbrowser.open('http://cyberknowledgebase.com/')
Downloading Files from the Web with the requests Module
The requests
module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression. The requests
module doesn’t come with Python, so you’ll have to install it first. From the command line, run pip install requests
.
>>> import request
sIf no error messages show up, then the request
module has been successfully installed.
Downloading a Web Page with the requests.get() Function
The requests.get()
function takes a string of a URL to download. By calling type()
on requests.get()
’s return value, you can see that it returns a Response
object, which contains the response that the web server gave for your request. I’ll explain the Response
object in more detail later, but for now, enter the following into the interactive shell while your computer is connected to the Internet:
>>> import requests
>>> res = requests.get('Enter URL from which you want to download')
>>> type(res)
>>> res.status_code == requests.codes.ok True >>>
Checking for Errors
As you’ve seen, the Response
object has a status_code
attribute that can be checked against requests.codes.ok
to see whether the download succeeded. A simpler way to check for success is to call the raise_for_status()
method on the Response
object. This will raise an exception if there was an error downloading the file and will do nothing if the download succeeded. Enter the following into the interactive shell:
>>> res = requests.get('http://cyberknowledgebase.com') >>> res.raise_for_status() Traceback (most recent call last): File "<pyshell#138>", line 1, in <module> res.raise_for_status() File "C:\Python34\lib\site-packages\requests\models.py", line 773, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found
The raise_for_status()
method is a good way to ensure that a program halts if a bad download occurs. This is a good thing: You want your program to stop as soon as some unexpected error happens. If a failed download isn’t a deal breaker for your program, you can wrap the raise_for_status()
line with try
and except
statements to handle this error case without crashing.
import requests res = requests.get('http://inventwithpython.com/page_that_does_not_exist') try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc))
This raise_for_status()
method call causes the program to output the following:
There was a problem: 404 Client Error: Not Found
Always call raise_for_status()
after calling requests.get()
. You want to be sure that the download has actually worked before your program continues.
Saving Downloaded Files to the Hard Drive
From here, you can save the web page to a file on your hard drive with the standard open()
function and write()
method. There are some slight differences, though. First, you must open the file in write binary mode by passing the string 'wb'
as the second argument to open()
. Even if the page is in plaintext (such as the Romeo and Juliet text you downloaded earlier), you need to write binary data instead of text data in order to maintain the Unicode encoding of the text.
To write the web page to a file, you can use a for
loop with the Response
object’s iter_content()
method.
>>> import requests >>> res = requests.get('https://cyberknowledgebase.com/abc.txt') >>> res.raise_for_status() >>> playFile = open('abc.txt', 'wb') >>> for chunk in res.iter_content(100000): playFile.write(chunk) 100000 78981 >>> playFile.close()
The iter_content()
method returns “chunks” of the content on each iteration through the loop. Each chunk is of the bytes data type, and you get to specify how many bytes each chunk will contain. One hundred thousand bytes is generally a good size, so pass 100000
as the argument to iter_content()
.
The file abc.txt will now exist in the current working directory. Note that while the filename on the website was xxx.txt, the file on your hard drive has a different filename. The requests
module simply handles downloading the contents of web pages. Once the page is downloaded, it is simply data in your program. Even if you were to lose your Internet connection after downloading the web page, all the page data would still be on your computer.
The write()
method returns the number of bytes written to the file.
To review, here’s the complete process for downloading and saving a file:
- Call
requests.get()
to download the file. - Call
open()
with'wb'
to create a new file in write binary mode. - Loop over the
Response
object’siter_content()
method. - Call
write()
on each iteration to write the content to the file. - Call
close()
to close the file.
That’s all there is to the requests
module! The for
loop and iter_content()
stuff may seem complicated compared to the open()
/write()
/close()
workflow you’ve been using to write text files, but it’s to ensure that the requests
module doesn’t eat up too much memory even if you download massive files.
Example to download X force data
import requests, sys, bs4, webbrowser
print("X force results")
x = input("Enter the IP\n")
res = requests.get('https://exchange.xforce.ibmcloud.com/search/' + x)
res.raise_for_status()
print (res)
malware_response = open('res', 'wb')
for malware_data in res.iter_content(10000000000):
malware_response.write(malware_data)
Parsing HTML with the BeautifulSoup Module
Beautiful Soup is a module for extracting information from an HTML page (and is much better for this purpose than regular expressions). The BeautifulSoup
module’s name is bs4
(for Beautiful Soup, version 4). To install it, you will need to run pip install beautifulsoup4
from the command line. To import Beautiful Soup you run import bs4
.
<!-- This is the example.html example file. --> <html><head><title>Cyber Knowledge Base</title></head> <body> <p>Download my <strong>Python</strong> book from <a href="http:// cyberknowledgebase.com">my website</a>.</p> <p class="slogan">Learn Python the easy way!</p> <p>By <span id="author">Al Davinder</span></p> </body></html>
As you can see, even a simple HTML file involves many different tags and attributes, and matters quickly get confusing with complex websites. Thankfully, Beautiful Soup makes working with HTML much easier.
Creating a BeautifulSoup Object from HTML
The bs4.BeautifulSoup()
function needs to be called with a string containing the HTML it will parse. The bs4.BeautifulSoup()
function returns is a BeautifulSoup
object. Enter the following into the interactive shell while your computer is connected to the Internet:
>>> import requests, bs4 >>> res = requests.get('https://cyberknowledgebase.com') >>> res.raise_for_status() >>> noStarchSoup = bs4.BeautifulSoup(res.text) >>> type(noStarchSoup) <class 'bs4.BeautifulSoup'>
This code uses requests.get()
to download the main page from my website and then passes the text
attribute of the response to bs4.BeautifulSoup()
. The BeautifulSoup
object that it returns is stored in a variable named noStarchSoup
.
You can also load an HTML file from your hard drive by passing a File
object to bs4.BeautifulSoup()
. Enter the following into the interactive shell (make sure the example.html file is in the working directory):
>>> exampleFile = open('example.html') >>> exampleSoup = bs4.BeautifulSoup(exampleFile) >>> type(exampleSoup) <class 'bs4.BeautifulSoup'>
Once you have a BeautifulSoup
object, you can use its methods to locate specific parts of an HTML document.
The requests
and BeautifulSoup
modules are great as long as you can figure out the URL you need to pass to requests.get()
. However, sometimes this isn’t so easy to find. Or perhaps the website you want your program to navigate requires you to log in first. The selenium
module will give your programs the power to perform such sophisticated tasks.
Controlling the Browser with the selenium Module
Importing the modules for Selenium is slightly tricky. Instead of import selenium
, you need to run from selenium import webdriver
. After that, you can launch the Firefox browser with Selenium. Enter the following into the interactive shell:
>>> from selenium import webdriver >>> browser = webdriver.Firefox() >>> type(browser) <class 'selenium.webdriver.firefox.webdriver.WebDriver'> >>> browser.get('http://cyberknowledgebase.com')
After calling webdriver.Firefox()
and get()
in IDLE, the Firefox browser appears.
NOTE: FOR COMPLETE INFO ABOUT THESE MODULE , SEARCH ON GOOGLE AS MY MOTIVE IS TO PROVIDE YOU THE INFORMATION FOR THE THINGS WHICH ARE VERY USEFUL FOR AUTOMATION
My First contribution to Company
Challenge- My daily task in company involved one boring task in which i need to check for the source IP reputation on web . For which I need to copy the source IP and than check on the first website , than on the second and than so on .....
Automation Done :
To make this task easier , I have written script which will automatically fetch data from the mail ... like IP detail for which we need to analyses reputation . Than Browser automatically get all the details for me with just one click
To make it , I followed the below steps :
1: import modules : win32com.client, sys, os , requests, re, webbrowser
win32com:
2: Get access to Outlook
3: Read mail
4: Get the IP extracted with Regex
5: Use that IP to search on browser