如何用Python下载网页上图像

文章目录

　　网络抓取是一种从网站获取数据的技术。在网上冲浪时，许多网站不允许用户保存数据供个人使用。一种方法是手动复制粘贴数据，这既乏味又耗时。网页抓取是从网站提取数据过程的自动化。在本文中，我们将讨论如何用Python下载网页上图像。

所需模块

bs4： Beautiful Soup (bs4) 是一个用于从 HTML 和 XML 文件中提取数据的 Python 库。该模块不是 Python 内置的。
requests： Requests 允许您极其轻松地发送 HTTP/1.1 请求。该模块也不是内置于 Python 中的。
os： python中的OS模块提供了与操作系统交互的功能。OS，属于Python 的标准实用程序模块。该模块提供了一种使用操作系统相关功能的可移植方式。

　　推荐：如何在Linux终端中设置使用ChatGPT

方法

导入模块
获取 HTML 代码

　　使用 Beautiful Soup 中的findAll方法从 HTML 代码中获取img标签列表。

images = soup.findAll('img')

　　使用 os 中的mkdir方法创建单独的文件夹用于下载图像。

os.mkdir(folder_name)

　　遍历所有图像并获取该图像的源 URL。获取源URL后，最后一步是下载图像获取图像内容

r = requests.get(Source URL).content

　　使用文件处理下载图像

# Enter File Name with Extension like jpg, png etc..
with open("File Name","wb+") as f:
      f.write(r)

使用Python下载网页上图像

1705193281 使用python下载网页图像 — 如何用Python下载网页上图像 4

from bs4 import *
import requests
import os

# CREATE FOLDER
def folder_create(images):
	try:
		folder_name = input("Enter Folder Name:- ")
		# folder creation
		os.mkdir(folder_name)

	# if folder exists with that name, ask another name
	except:
		print("Folder Exist with that name!")
		folder_create()

	# image downloading start
	download_images(images, folder_name)


# DOWNLOAD ALL IMAGES FROM THAT URL
def download_images(images, folder_name):

	# initial count is zero
	count = 0

	# print total images found in URL
	print(f"Total {len(images)} Image Found!")

	# checking if images is not zero
	if len(images) != 0:
		for i, image in enumerate(images):
			# From image tag ,Fetch image Source URL

						# 1.data-srcset
						# 2.data-src
						# 3.data-fallback-src
						# 4.src

			# Here we will use exception handling

			# first we will search for "data-srcset" in img tag
			try:
				# In image tag ,searching for "data-srcset"
				image_link = image["data-srcset"]
				
			# then we will search for "data-src" in img 
			# tag and so on..
			except:
				try:
					# In image tag ,searching for "data-src"
					image_link = image["data-src"]
				except:
					try:
						# In image tag ,searching for "data-fallback-src"
						image_link = image["data-fallback-src"]
					except:
						try:
							# In image tag ,searching for "src"
							image_link = image["src"]

						# if no Source URL found
						except:
							pass

			# After getting Image Source URL
			# We will try to get the content of image
			try:
				r = requests.get(image_link).content
				try:

					# possibility of decode
					r = str(r, 'utf-8')

				except UnicodeDecodeError:

					# After checking above condition, Image Download start
					with open(f"{folder_name}/images{i+1}.jpg", "wb+") as f:
						f.write(r)

					# counting number of image downloaded
					count += 1
			except:
				pass

		# There might be possible, that all
		# images not download
		# if all images download
		if count == len(images):
			print("All Images Downloaded!")
			
		# if all images not download
		else:
			print(f"Total {count} Images Downloaded Out of {len(images)}")

# MAIN FUNCTION START
def main(url):

	# content of URL
	r = requests.get(url)

	# Parse HTML Code
	soup = BeautifulSoup(r.text, 'html.parser')

	# find all images in URL
	images = soup.findAll('img')

	# Call folder create function
	folder_create(images)


# take url
url = input("Enter URL:- ")

# CALL MAIN FUNCTION
main(url)

1705193378 使用python下载网页图像效果 — 如何用Python下载网页上图像 5

　　推荐：如何创建空和完整的NumPy数组

　　推荐：BeautifulSoup使用教程

Python
使用PyScript在Web上运行Python可视化
发布2024年6月10日上午9:342024年6月14日下午3:58更新2024年6月14日下午3:58
使用PyScript在Web上运行Python可视化　　PyScript 是一个开源框架，可让您直接在 Web 浏览器中运行 Python 代码，从而可以使用 Python 为后…
Read More 使用PyScript在Web上运行Python可视化
ChatGPT教程|Python
Caktus AI怎么使用
发布2024年1月23日上午11:152024年1月22日下午3:11
Caktus AI怎么使用　　Caktus AI是一款专为学生打造的人工智能产品。这是一个有趣的平台，为用户带来了多种工具，从内容生成到研究再到数学。我们将深入研究产品、定价、使…
Read More Caktus AI怎么使用
Matplotlib教程|Python
Matplotlib与Seaborn的区别
发布2024年1月22日上午11:422024年1月14日下午12:25
Matplotlib与Seaborn的区别　　数据可视化是数据的图形表示。它将庞大的数据集转换为小图表，从而有助于数据分析和预测。它是数据科学不可或缺的元素，它使复杂的数据更易于…
Read More Matplotlib与Seaborn的区别
Jupyter Notebook教程|Python
10个Jupyter Notebook提示和技巧
发布2024年1月20日下午5:002024年1月12日下午5:02
10个Jupyter Notebook提示和技巧　　通过专家提示和技术包括节省时间的快捷方式、强大的神奇功能和高级功能释放 Jupyter Notebook 的全部潜力，以提高您…
Read More 10个Jupyter Notebook提示和技巧
NumPy教程|Python
NumPy二元运算符
发布2024年1月19日下午3:532024年1月8日下午4:24
NumPy二元运算符　　二元运算符作用于位，进行逐位运算。二元运算只是组合两个值以创建新值的规则。以下是 NumPy 包中可用的按位运算函数。操作及说明 1 按位与计算数组元素…
Read More NumPy二元运算符
Pycharm教程|Python
如何在Pycharm中安装Seaborn
发布2024年1月18日下午3:452024年1月10日下午10:20
如何在Pycharm中安装Seaborn 　　Seaborn 是一个流行的 Python 数据可视化库，基于 Matplotlib 库。您可以创建信息丰富的设计和有吸引力的统计图形…
Read More 如何在Pycharm中安装Seaborn

如何用Python下载网页上图像

所需模块

方法

使用Python下载网页上图像

相关文章