Commit graph

44 commits

Author SHA1 Message Date
ORelio
05f2fb5ec7
[FeedExpander] Decode HTML entities in title (#3110)
Feed item title may contain HTML entities that we need to decode,
else they are encoded twice when generating the expanded feed.
2022-10-20 18:26:43 +02:00
Dag
27b3d7c34e
feat: improve logging and error handling (#2994)
* feat: improve logging and error handling

* trim absolute path from file name

* fix: suppress php errors from xml parsing

* fix: respect the error reporting level in the custom error handler

* feat: dont log error which is produced by bots

* ignore error about invalid bridge name

* upgrade bridge exception from warning to error

* remove remnants of using phps builin error handler

* move responsibility of printing php error from logger to error handler

* feat: include url in log record context

* fix: always include url in log record contect

Also ignore more non-interesting exceptions.

* more verbose httpexception

* fix

* fix
2022-09-08 19:07:57 +02:00
Dag
8ea9472300
feat: improve exception message when xml parsing fails (#3009) 2022-09-05 14:26:11 +02:00
Dag
2bbce8ebef
refactor: general code base refactor (#2950)
* refactor

* fix: bug in previous refactor

* chore: exclude phpcompat sniff due to bug in phpcompat

* fix: do not leak absolute paths

* refactor/fix: batch extensions checking, fix DOS issue
2022-08-06 22:46:28 +02:00
Jan Tojnar
951092eef3
Fix coding style missed by phpbcf (#2901)
$ composer require --dev friendsofphp/php-cs-fixer

$ echo >.php-cs-fixer.dist.php "<?php

$finder = PhpCsFixer\Finder::create()
    ->in(__DIR__);

$rules = [
    '@PSR12' => true,
    // '@PSR12:risky' => true,
    '@PHP74Migration' => true,
    // '@PHP74Migration:risky' => true,
    // buggy, duplicates existing comment sometimes
    'no_break_comment' => false,
    'array_syntax' => true,
    'lowercase_static_reference' => true,
    'visibility_required' => false,
    // Too much noise
    'binary_operator_spaces' => false,
    'heredoc_indentation' => false,
    'trailing_comma_in_multiline' => false,
];

$config = new PhpCsFixer\Config();

return $config
    ->setRules($rules)
    // ->setRiskyAllowed(true)
    ->setFinder($finder);

"

$ vendor/bin/php-cs-fixer --version
PHP CS Fixer 3.8.0 BerSzcz against war! by Fabien Potencier and Dariusz Ruminski.
PHP runtime: 8.1.7

$ vendor/bin/php-cs-fixer fix
$ rm .php-cs-fixer.cache
$ vendor/bin/php-cs-fixer fix
2022-07-08 13:00:52 +02:00
Dag
4f75591060
Reformat codebase v4 (#2872)
Reformat code base to PSR12

Co-authored-by: rssbridge <noreply@github.com>
2022-07-01 15:10:30 +02:00
Dag
5076d09de6
refactor: prepare for PSR2 (#2859) 2022-06-24 18:29:35 +02:00
Dag
1d0a0b927b
fix: use accept header when fetching feed (#2737)
* fix: use accept header when fetching feed

* fix: include atom too, and reuse constants from format classes

* add a catch all accept header
2022-05-18 00:18:33 +02:00
dag
dbee47f1d6
fix: give better error message when feed can't be parsed (#2618) 2022-04-10 18:54:32 +02:00
Stelfux
91b8e4196e
[FeedExpander.php] Preserve original icon (#2145) 2022-03-26 19:09:27 +01:00
ORelio
b754d14698
[FeedExpander] Handle Atom enclosures (#2039) 2021-04-04 15:21:15 +05:00
ORelio
8144488a9e [FeedExpander] Fix PHP notice on missing uri field
(guid is valid uri AND item uri is not valid)
 => (guid is valid uri AND item uri is empty or not valid)
2020-08-11 14:01:44 +02:00
ORelio
ca9c2abb60 [FeedExpander] Fix item href being used as feed uri (#1033) 2019-02-11 19:07:03 +01:00
logmanoriginal
1c17ffb5c4 [FeedExpander] Add constants for feed types 2018-11-18 16:18:40 +01:00
logmanoriginal
326cfb21cf [FeedExpander] Rename $name to $title 2018-11-18 16:11:38 +01:00
logmanoriginal
8ab1fb86a9 [FeedExpander] Let collectExpandableDatas() return self 2018-11-18 16:03:32 +01:00
logmanoriginal
c4550be812 lib: Add API documentation 2018-11-18 09:41:14 +01:00
logmanoriginal
c63af2e7ad core: Add separate Debug class
Replaces 'debugMessage' by specialized debug function 'Debug::log'.
This function takes the same arguments as the previous 'debugMessage'.

A separate Debug class allows for further optimization and separation
of concern.
2018-11-10 20:03:05 +01:00
logmanoriginal
4b7fea5ebc [RssBridge] Include interfaces once 2018-11-06 19:23:32 +01:00
ORelio
de8cee6a1c Catching up | [Main] Debug mode, parse utils, MIME | [Bridges] Add/Improve 20 bridges (#802)
* Debug mode improvements

 - Improve debug warning message
 - Restore error reporting in debug mode
 - Fix 'notice' messages for unset fields

* Add parsing utility functions

html.php
 - extractFromDelimiters
 - stripWithDelimiters
 - stripRecursiveHTMLSection
 - markdownToHtml (partial)

bridges
 - remove now-duplicate functions
 - call functions from html.php instead

* [Anidex] New bridge

Anime torrent tracker

* [Anime-Ultime] Restore thumbnail

* [CNET] Recreate bridge

Full rewrite as the previous one was broken

* [Dilbert] Minor URI fix

Use new self::URI property

* [EstCeQuonMetEnProd] Fix content extraction

Bridge was broken

* [Facebook] Fix "SpSonsSoriSsés" label

... which was taking space in item title

* [Futura-Sciences] Use HTTPS, More cleanup

Use HTTPS as FS now offer HTTPS
Clean additional useless HTML elements

* [GBATemp] Multiple fixes

- Fix categories: missing "break" statements
- Restore thumbnail as enclosure
- Fix date extraction
- Fix user blog post extraction
- Use getSimpleHTMLDOMCached

* [JapanExpo] Fix bridge, HTTPS, thumbnails

- Fix getSimpleHTMLDOMCached call
- Upgrade to HTTPS as JE now offers HTTPS
- Restore thumbnails as enclosures

* [LeMondeInformatique] Fix bridge, HTTPS

- Upgrade to HTTPS as LMI now offers HTTPS
- Restore thumbnails using small images
- Fix content extraction
- Fix text encoding issue

* [Nextgov] Fix content extraction

- Restore thumbnail and use small image
- Field extraction fixes

* [NextInpact] Add categories and filtering by type

- Offer all RSS feeds
- Allow filtering by article type
- Implement extraction for brief articles
- Remove article limit, many brief articles are publied all at once

* [NyaaTorrents] New bridge

Anime torrent tracker

* [Releases3DS] Cache content, restore thumbnail

- Use getSimpleHTMLDOMCached
- Restore thumbnail as enclosure

* [TheHackerNews] Fix bridge

 - Fix content extraction including article body
 - Restore thumbnail as enclosure

* [WeLiveSecurity] HTTPS, Fix content extraction

- Upgrade to HTTPS as WLS now offers HTTPS
- Fix content extraction including article body

* [WordPress] Reduce timeout, more content selectors

- Reduce timeout to use default one (1h)
- Add new content selector (articleBody)
- Find thumbnail and set as enclosure
- Fix <script> cleanup

* [YGGTorrent] Increase limit, use cache

- Increase item limit as uploads are very frequent
- Use getSimpleHTMLDOMCached

* [ZDNet] Rewrite with FeedExpander

- Upgrade to HTTPS as ZD now offers HTTPS
- Use FeedExpander for secondary fields
- Fix content extraction for article body

* [Main] Handle MIME type for enclosures

Many feed readers will ignore enclosures (e.g. thumbnails) with no MIME type. This commit adds automatic MIME type detection based on file extension (which may be inaccurate but is the only way without fetching the content).

One can force enclosure type using #.ext anchor (hacky, needs improving)

* [FeedExpander] Improve field extraction

- Add support for passing enclosures
- Improve author and uri extraction
- Fix 'notice' PHP error messages

* [Pull] Coding style fixes for #802

* [Pull] Implementing changes for #802

 - Fix coding style issues with str append
 - Remove useless CACHE_TIMEOUT
 - Use count() instead of $limit
 - Use defaultLinkTo() + handle strings
 - Use http_build_query()
 - Fix missing </em>
 - Remove error_reporting(0)
 - warning CSS (@LogMANOriginal)
 - Fix typo in FeedExpander comment

* [Main] More documentation for markdownToHtml

See #802 for more details
2018-09-09 20:20:13 +01:00
Walter Barrett
704a87ad97 Icons: Allow Bridge-specified icons (#788) 2018-08-21 17:46:47 +02:00
LogMANOriginal
193ca87afa [phpcs] enforce single quotes (#732)
* [phpcs] Add rule to enforce single quoted strings
2018-06-29 22:55:33 +01:00
logmanoriginal
4fb1366aaf [FeedExpander] Fix Serialization of 'SimpleXMLElement' is not allowed 2017-08-10 13:35:19 +02:00
logmanoriginal
8166e33e7f [FeedExpander] Remove whitespace from source content
Whitespace at the beginning of feeds causes parsing errors. This is
an example using an ill-formatted RSS feed:

   "XML or text declaration not at start of entity"
-- https://validator.w3.org

This commit automatically removes all proceeding and trailing white-
space from the source content before resume parsing.
2017-08-10 13:20:35 +02:00
logmanoriginal
a4b9611e66 [phpcs] Add missing rules
- Do not add spaces after opening or before closing parenthesis

  // Wrong
  if( !is_null($var) ) {
    ...
  }

  // Right
  if(!is_null($var)) {
    ...
  }

- Add space after closing parenthesis

  // Wrong
  if(true){
    ...
  }

  // Right
  if(true) {
    ...
  }

- Add body into new line
- Close body in new line

  // Wrong
  if(true) { ... }

  // Right
  if(true) {
    ...
  }

Notice: Spaces after keywords are not detected:

  // Wrong (not detected)
  // -> space after 'if' and missing space after 'else'
  if (true) {
    ...
  } else{
    ...
  }

  // Right
  if(true) {
    ...
  } else {
    ...
  }
2017-07-29 19:55:12 +02:00
Frans de Jonge
781e4f1908 [FeedExpander] Deal with empty item 2017-06-24 15:09:15 +02:00
logmanoriginal
a2108c784f [FeedExpander] Properly cast simplexml elements
This fixes a possible cause of
"Serialization of 'SimpleXMLElement' is not allowed"
reported via #487
2017-03-13 22:12:11 +01:00
logmanoriginal
512a4f292b bridges: Return parent::getURI by default 2017-02-15 19:38:32 +01:00
logmanoriginal
c4169f1579 bridges: Return parent::getName by default 2017-02-15 19:38:32 +01:00
logmanoriginal
ff83410534 style: Fix coding styles 2017-02-14 17:28:07 +01:00
logmanoriginal
49281a2ed3 [FeedExpander] Remove orphan getDescription function 2016-10-16 12:47:37 +02:00
logmanoriginal
ee7ddcf992 [FeedExpander] Fix SimplXMLElement serialization error
Previously FeedExpander stored SimpleXMLElement objects into
the items array. These objects cannot be serialized using the
serialize function.
2016-10-02 18:07:56 +02:00
Pierre Mazière
f1fb95b257 [core] extract BridgeAbstract methods to make them functions
- returnError, returnServerError, returnClientError ,debugMessage are
  moved to lib/error.php

- getContents, getSimpleHTMLDOM, getSimpleHTMLDOMCached are moved to
  lib/contents.php

Signed-off-by: Pierre Mazière <pierre.maziere@gmx.com>
2016-09-25 23:22:33 +02:00
logmanoriginal
351eb00400 [FeedExpander] Align logical operators in next line
This is a cosmetic change to apply the same standard as
in HTMLUtils
2016-09-17 20:44:31 +02:00
Pierre Mazière
ca0842ccf8 [FeedExpander] widen guid use as uri provider
Signed-off-by: Pierre Mazière <pierre.maziere@gmx.com>
2016-09-17 19:37:39 +02:00
Pierre Mazière
2744c13735 [FeedExpander] fix feeds using guid tag as item uri provider
Signed-off-by: Pierre Mazière <pierre.maziere@gmx.com>
2016-09-17 19:37:39 +02:00
logmanoriginal
1819943451 [FeedExpander] Write debug message for custom build function 2016-09-17 18:19:26 +02:00
logmanoriginal
ffc9418620 [FeedExpander] Fix typos 2016-09-17 18:16:25 +02:00
Pierre Mazière
15c422c648 [FeedExpander] implement default parseItem() method
Signed-off-by: Pierre Mazière <pierre.maziere@gmx.com>
2016-09-12 10:39:34 +02:00
Pierre Mazière
655b3d578d [FeedExpander] simplify feed type detection and store it
Signed-off-by: Pierre Mazière <pierre.maziere@gmx.com>
2016-09-12 10:38:50 +02:00
logmanoriginal
62eec43980 [core] Apply common indentation
All files are now using tabs for indentation
2016-09-10 20:41:11 +02:00
logmanoriginal
2eec89ab27 [bridges] Change all bridges to use BridgeAbstract with getSimpleHTMLDOMCached 2016-09-10 19:11:09 +02:00
logmanoriginal
f1fb527607 [FeedExpander] Add optional parameter to specify max items
Allows caller of collectExpandableDatas to request a limited
amount of items
2016-09-05 20:17:00 +02:00
logmanoriginal
298dc49c67 [lib] Split Bridge/Cache/Format into one file per class
The files have grown to a size where it is necessary to search
for a class in a file. This commit splits the content into one
file per class. RSS-Bridge will require implementations and
the implementations will require (once) the interfaces.
2016-09-05 18:05:19 +02:00