I am writing this post after completion of 3 weeks of learning Python and guys I am seriously impressed with this language .
You know best part is that this language help you to talk with almost any API and have lots of modules in it . Like
- Webbrowser Comes with Python and opens a browser to a specific page.
- Requests. Downloads files and web pages from the Internet.
- Beautiful Soup. Parses HTML, the format that web pages are written in.
- Selenium. Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this browser.
Web Browser Module
open() function can launch a new browser to a specified URL. Enter the following into the interactive shell:
>>> import webbrowser >>> webbrowser.open('http://cyberknowledgebase.com/')
Downloading Files from the Web with the requests Module
requests module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression. The
requests module doesn’t come with Python, so you’ll have to install it first. From the command line, run
pip install requests.
>>> import request
sIf no error messages show up, then the
request module has been successfully installed.
Downloading a Web Page with the requests.get() Function
requests.get() function takes a string of a URL to download. By calling
requests.get()’s return value, you can see that it returns a
Response object, which contains the response that the web server gave for your request. I’ll explain the
Response object in more detail later, but for now, enter the following into the interactive shell while your computer is connected to the Internet:
>>> import requests
>>> res = requests.get('Enter URL from which you want to download')
>>> res.status_code == requests.codes.ok True >>>
Checking for Errors
As you’ve seen, the
Response object has a
status_code attribute that can be checked against
requests.codes.ok to see whether the download succeeded. A simpler way to check for success is to call the
raise_for_status() method on the
Response object. This will raise an exception if there was an error downloading the file and will do nothing if the download succeeded. Enter the following into the interactive shell:
>>> res = requests.get('http://cyberknowledgebase.com') >>> res.raise_for_status() Traceback (most recent call last): File "<pyshell#138>", line 1, in <module> res.raise_for_status() File "C:Python34libsite-packagesrequestsmodels.py", line 773, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found
raise_for_status() method is a good way to ensure that a program halts if a bad download occurs. This is a good thing: You want your program to stop as soon as some unexpected error happens. If a failed download isn’t a deal breaker for your program, you can wrap the
raise_for_status() line with
except statements to handle this error case without crashing.
import requests res = requests.get('http://inventwithpython.com/page_that_does_not_exist') try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc))
raise_for_status() method call causes the program to output the following:
There was a problem: 404 Client Error: Not Found
raise_for_status() after calling
requests.get(). You want to be sure that the download has actually worked before your program continues.
Saving Downloaded Files to the Hard Drive
From here, you can save the web page to a file on your hard drive with the standard
open() function and
write() method. There are some slight differences, though. First, you must open the file in write binary mode by passing the string
'wb' as the second argument to
open(). Even if the page is in plaintext (such as the Romeo and Juliet text you downloaded earlier), you need to write binary data instead of text data in order to maintain the Unicode encoding of the text.
To write the web page to a file, you can use a
for loop with the
>>> import requests >>> res = requests.get('https://cyberknowledgebase.com/abc.txt') >>> res.raise_for_status() >>> playFile = open('abc.txt', 'wb') >>> for chunk in res.iter_content(100000): playFile.write(chunk) 100000 78981 >>> playFile.close()
iter_content() method returns “chunks” of the content on each iteration through the loop. Each chunk is of the bytes data type, and you get to specify how many bytes each chunk will contain. One hundred thousand bytes is generally a good size, so pass
100000 as the argument to
The file abc.txt will now exist in the current working directory. Note that while the filename on the website was xxx.txt, the file on your hard drive has a different filename. The
requests module simply handles downloading the contents of web pages. Once the page is downloaded, it is simply data in your program. Even if you were to lose your Internet connection after downloading the web page, all the page data would still be on your computer.
write() method returns the number of bytes written to the file.
To review, here’s the complete process for downloading and saving a file:
requests.get()to download the file.
'wb'to create a new file in write binary mode.
- Loop over the
write()on each iteration to write the content to the file.
close()to close the file.
That’s all there is to the
requests module! The
for loop and
iter_content() stuff may seem complicated compared to the
close() workflow you’ve been using to write text files, but it’s to ensure that the
requests module doesn’t eat up too much memory even if you download massive files.
Example to download X force data
import requests, sys, bs4, webbrowser
print(“X force results”)
x = input(“Enter the IPn”)
res = requests.get(‘https://exchange.xforce.ibmcloud.com/search/’ + x)
malware_response = open(‘res’, ‘wb’)
for malware_data in res.iter_content(10000000000):
Parsing HTML with the BeautifulSoup Module
Beautiful Soup is a module for extracting information from an HTML page (and is much better for this purpose than regular expressions). The
BeautifulSoup module’s name is
bs4 (for Beautiful Soup, version 4). To install it, you will need to run
pip install beautifulsoup4 from the command line. To import Beautiful Soup you run
<!-- This is the example.html example file. --> <html><head><title>Cyber Knowledge Base</title></head> <body> <p>Download my <strong>Python</strong> book from <a href="http:// cyberknowledgebase.com">my website</a>.</p> <p class="slogan">Learn Python the easy way!</p> <p>By <span id="author">Al Davinder</span></p> </body></html>
As you can see, even a simple HTML file involves many different tags and attributes, and matters quickly get confusing with complex websites. Thankfully, Beautiful Soup makes working with HTML much easier.
Creating a BeautifulSoup Object from HTML
bs4.BeautifulSoup() function needs to be called with a string containing the HTML it will parse. The
bs4.BeautifulSoup() function returns is a
BeautifulSoup object. Enter the following into the interactive shell while your computer is connected to the Internet:
>>> import requests, bs4 >>> res = requests.get('https://cyberknowledgebase.com') >>> res.raise_for_status() >>> noStarchSoup = bs4.BeautifulSoup(res.text) >>> type(noStarchSoup) <class 'bs4.BeautifulSoup'>
This code uses
requests.get() to download the main page from my website and then passes the
text attribute of the response to
BeautifulSoup object that it returns is stored in a variable named
You can also load an HTML file from your hard drive by passing a
File object to
bs4.BeautifulSoup(). Enter the following into the interactive shell (make sure the example.html file is in the working directory):
>>> exampleFile = open('example.html') >>> exampleSoup = bs4.BeautifulSoup(exampleFile) >>> type(exampleSoup) <class 'bs4.BeautifulSoup'>
Once you have a
BeautifulSoup object, you can use its methods to locate specific parts of an HTML document.
BeautifulSoup modules are great as long as you can figure out the URL you need to pass to
requests.get(). However, sometimes this isn’t so easy to find. Or perhaps the website you want your program to navigate requires you to log in first. The
selenium module will give your programs the power to perform such sophisticated tasks.
Controlling the Browser with the selenium Module
Importing the modules for Selenium is slightly tricky. Instead of
import selenium, you need to run
from selenium import webdriver. After that, you can launch the Firefox browser with Selenium. Enter the following into the interactive shell:
>>> from selenium import webdriver >>> browser = webdriver.Firefox() >>> type(browser) <class 'selenium.webdriver.firefox.webdriver.WebDriver'> >>> browser.get('http://cyberknowledgebase.com')
get() in IDLE, the Firefox browser appears.
NOTE: FOR COMPLETE INFO ABOUT THESE MODULE , SEARCH ON GOOGLE AS MY MOTIVE IS TO PROVIDE YOU THE INFORMATION FOR THE THINGS WHICH ARE VERY USEFUL FOR AUTOMATION
My First contribution to Company
Challenge- My daily task in company involved one boring task in which i need to check for the source IP reputation on web . For which I need to copy the source IP and than check on the first website , than on the second and than so on …..
Automation Done :
To make this task easier , I have written script which will automatically fetch data from the mail … like IP detail for which we need to analyses reputation . Than Browser automatically get all the details for me with just one click
To make it , I followed the below steps :
1: import modules : win32com.client, sys, os , requests, re, webbrowser
2: Get access to Outlook
3: Read mail
4: Get the IP extracted with Regex
5: Use that IP to search on browser