--> Skip to main content

Featured

Steps to Create a Project on GitHub

Steps to create a project on GitHub:  1.   Start a repo on GitHub 2.   Make a README.md file 3.   Open vs code – new folder – new terminal – git clone http:…. (from the repo). 4.   In terminal – cd theprojectname   à move into the project file 5.   Ls -la is for showing hidden file à not working for me ???? 6.   Make some changes to README file, in terminal git status à shows all the changes on the file 7.   To add all changes update to the folder à git add . (the dot . means all changes to be added) 8.   Add a new file index.html, in terminal à git commit -m “title of commit” -m “description of commit” 9.   Then git push origin master 10.                 ****Initial a repo in local text editor à git init 11.                 After use git add . etc, when pus...

Using Python Send HTTP Requests to Server - Compare Socket with urllib Package and Request Module

This python project is a very straightforward comparison between socket and urllib for sending HTTP requests to servers.

Socket is a low-level networking interface, while urllib.request,urlopen() will do all the hard work of making Get requests to the URL provided and do the encode as well.  What is returned is an HTTPResponse object. 

HTTPResponse object is in bytes. We can use .read() to read the content, or .headers to get all the headers information. Also, with the help of an Open Source library called beautiful soup, the bytes content can be parsed into a BeautifulSoup Object, which is a nested data structure for further analysis of the content and scraping. 

For more complicated use of urllib library, here is the python doc

For the simplest example, here is the code :

''' Compare socket with urllib for sending http requests '''

import socket
import urllib.request

# client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# client_socket.connect(('www.yahoo.com', 80))
# request = "GET / HTTP/1.1\r\nHost:https://www.yahoo.com\r\n\r\n" 
# client_socket.send(request.encode())
# response = client_socket.recv(4096)
# print(f'************* {len(response)}')
# print(response.decode())
# client_socket.close()

'''urllib library is a lot easier than socket module 
for getting http requests and return responses'''

# httpResponse_data = urllib.request.urlopen('https://www.yahoo.com')
# print(f'Headers: {httpResponse_data.headers}')

# use with statement instead:
with urllib.request.urlopen('https://www.yahoo.com') as response_file:
    data = response_file.read()
    print(data[:1000].decode())

However, when we talk about web scraping, we shouldn't skip Requests library. Requests is an open-source python library that makes HTTP requests more human-friendly and simple to use, it is powered by urllib3 and used more nowadays because of its readability and POST/GET freedom, etc. 

When dealing with scraping a website, requests and beautiful soup usually join force for processing and finding information on a web page.


Related reading:
1. Find most frequent words in an article online
2.Scrape Ted.com videos

Popular Posts